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yiEi.p op the agHEEiga 

The present invention relates generally to the 

ZZtV'T^ (CF) 9ene ' and < — Particularly to the 
identifxcation, relation and cloning of the DNA sequence 
corresponding to mutants of the CP gene, as well as their 
transcripts, gene products and genetic information at 
exon/intron boundaries. The present invention also 
relates to methods of screening for and detection of CP 
carriers, CP diagnosis, prenatal CP screening and 
diagnosis, and gene therapy utilizing recombinant 
technologies and drug therapy using the information 

TZMZiT DNA ' protein - and * etabolic — 

PftCKORomro n» ggg pmn Wf| 

cystic fibrosis (CF) is th. sost common .over, 
sutosomel rscsssive genetic disorder in the Caucasian 
populstion. It affects approximately l i„ 200 0 live 

^ S » T ^ PP ««rs„ Hill, 

1 dL:^:. pproxiMtely 1 in 20 *«■"» «• <* 

described in tbe let. 



the basic defect remains unknown. The major 



ujMvuuwn. me manor 

symptoms or cystic fibrosis include chronic pulmonary 

Hz p : no r tio exoerine e^t.* 

ZT \T" ^ leVelS - The By » ptOTS «• =o»=ist.nt 
with cystic fibrosis being en exocrine disorder. 

Although recent advances neve been sed. in the enslysis 

.oi^rr" across **• apicai — ««— °* 

ebno^.^ 0f , CP . P " i " lt " i= not oleer that th. 

abnormal regulation of chloride channels represent, the 
primary defect in th. disease. Given th. lech of 
understanding of th. moleculer mechanism of th. dis.as. 
an alternative epprcach has therefore been t*cen " an 
attempt to und.rst.nd th. natur. o, th. moleculer defect 
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through direct cloning of the responsible gene on the 
basis of its chromosomal location. 

However, there is no clear phenotype that directs an 
approach to the exact nature of the genetic basis of the 
5 disease, or that allows for an identification of the 
cystic fibrosis gene. The nature of the CF defect in 
relation to the population genetics data has not been 
readily apparent. Both the prevalence of the disease and 

m ^, 1 heterogeneit * have »>ee„ explained by several 

10 different mechanisms: high mutation rate 

lllZZllT adVantage ' 9enetic "-Itiple loci, and 

reproductive compensation. 

Many of the hypotheses can not be tested due to the 
lac* of knowledge of the basic defect. Therefore, 
15 alternative approaches to the determination and 

characterization of the CF gene have focused on an 

analysis! 0 l0Catl ° n ° f ^ * 

20 „ , Linka9e analysis of «» cp 9ene to antigenic and 
20 protean markers was attempted in the 1950's, but no 
positive results were obtained [Steinberg et al Am. J. 

f^f"f-f Steinberg and MortoT^ 

ff f Hum, G p n «t 8: 177-189, (1956); coodchild et al j. 
SSDSti 7: 417-419, 1976. 

25 «- . M ? rS reC6ntly ' At has bec °»e Possible to use RFLP' S 
tofacilitate linkage analysis. The first linkage of an 
RFIP marker to the CF gene was disclosed in 1985 [Tsui et 
al. ssisnss 230: 1054-1057, 1985] in which linkage was 

3 9 C £~?"' ^! / 8SOCiatlon ^und in an analysis of 

[ 'Tat * T a " eCted W ChUdren ' ThiS that 
although the chromosomal location had not been 

established, the location «^ J( 

, u« e xocacion of the disease gene had been 

narrowed to about 1% of fh fl n 
35 genome, or about 30 

35 million nucleotide base pairs. 

establish^ T° aal l0Cati ° n ° f D0CRI - 917 P robe — 
established using rodent-human hybrid cell lines 
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containing different human chromosome complements it 
was shown that DOCR1-917 (and therefore the CF gene) maps 
to human chromosome 7. 

Further physical and genetic linkage studies were 
5 pursued in an attempt to pinpoint the location of the CF 
gene. Zengerling et al rAm. j. ge^ 40: 228-236 

(1987) ] describe the use of human-mouse somatic cell 
hybrids to obtain a more detailed physical relationship 
between the CF gene and the markers known to be linked 

10 with it. This publication shows that the CF gene can be 
assigned to either the distal region of band q22 or the 
proximal region of band q3l on chromosome 7. 

Rommens et al rAm. J. 43; 645 . 663 

(1988) ] give a detailed discussion of the isolation of 
15 many new 7g31 probes. The approach outlined led to the 

isolation of two new probes, D7S122 and D7S340, which are 
close to each other. Pulsed field gel electrophoresis 
mapping indicates that these two rflp ffiarkers are between 

20 r°/ a T rS kn ° Wn t0 " ank **• CP gene ' tWiit.. R. , 

20 Woodward S., Leppert M. , et al. 318: 382-384, 

(1985)3 and D7S8 [Wainwright, B. J., scambler, p j 

TiL T*"**' ***** 318J 384 " 385 (1985 »< therefore 
in the CF gene region. The discovery of these markers 

25 japing* * Startlng P ° int £or c ^osome walking and 

Estivill et al, [1!aij3ES 326: 840-845(1987)] disclose 
that a candidate cDNA gene was located and partially 
characterized. This however, does not teach the correct 

30 ^ ° f ^ CF9ene - The ^-ence discloses a 

30 candidate cDNA gene downstream of a CpG island, which are 
undermethylated GC nucleotide-rich regions ups^eam of 

IZ Y Ve ^f rate ~ Th « chromosomal localization of 
the candidate locus is identified as the XV2C region. 

35 TsllZTT 'I deSCrib6d ^ Patent ^cation 

r^ g-ene. ^ ~ — «* ^lude 
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A major difficulty In identifying the cr „ k 

TIT 1SC : °' ^ctL^cnro^T 8 

Such rearrangements and deletions could k 
cytologically and as a result, a physica7"cati 
perticular chromosome could be corroded wHh cL " ' 
particular disease. Further thi. JT, . 
10 could h. correleted wITe^l^ er^rV 00 " 10 " 
«l«ionship o. tw .e„ publicly ava^M 0 " 
end cytologically visible al^r.t ion s L ^e' ""^ 
Knowledge of the molecular location !, 1 <*™»osomes. 
particular disease would aU„rolo„ino ~ * 
15 that gene by routine procedures T.~, T " 9MnCln * ° f 
.ene product is ^.JtlZl^TZ T" 

iz^z::. 1 — - --sion-od o :c n t:*of 

- the ^^--r^^T^ « 
known in the prior art with ►k fibrosis was 

of MET and D7S8, ^J**" "»««=stion 

did not pinpoint it! , ^ th " CP «•»• °<* 

Pinpoint its molecular locating u. 

inventors devised various novli Pt ' Bmt 
*■ to approach the CP genHn Z J'" 6 Cl ° nin9 st "^ies 
invention. *he mst£. employe! "" ee*" T" ~ t 
include chromosome jumping from the m^inuTT" 
cloning of dna f ramiBB f B * flanking markers, 

with the us. of pudeiT \*V™ 
30 combination of somatic U "T*"" 1 '' * 

„ = =s - - ■ 

35 means of these novel strateaia* !k * ° n * By 

were able to identify to! ' Pr ° Sent inv «ntors 

identify the gene responsible for cystic 
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fibrosis where the prior art was uncertain or, even in 
one case, wrong. 

The application of the*, genetic end molecular 
cloning strategies hes allowed th. ieoletion end cDHA 
5 elonxng of the cystic fibrosis gene on the besi. of its 
chromosomal location, without the bsnsfit of genomic 
rearrangements to point the way. The identificetion of 
the normal and mutant forms of the CP gene end gene 
products he. eliowed for the development of screenina a „„ 

2VT r *l, 9Sna Produet - Tteou *> interaction with 

the defective gen. product and the pathway in which this 
gene product is involved, therepy through normel gene 
product supplementation and gene manipulation and 
15 delivery are now made possible. 

The gene involved in the cystic fibrosis disease 
process, hereinafter the -CP gene" and its functional 
equivalents, has been identified, isolated and cDNA 
cloned, and its transcripts and ,en. products identified 
20 end sequenced, a three base pair deletion leading to the 
omission of a phenylelanin. residue in the g»e producf 

=r,l7ir er " in ° d t0 COrreSPOnd t0 ^tions o£ "e 

25 ^ \t ^ f" ferent -"tations involved in most if not 
« f er T tain5Ci8eS - ^ ^ject matter is 
s h! 3S6 mHT 9 ^ State " ^ «»lie.tlo„ 

According to this invention, other base pair 
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With the identification and sequencing of the mutant 
gene and its gene product, nucleic acid probes and 
antibodies raised to the mutant gene product can be used 
xn a variety of hybridization and immunological assays to 
5 screen for and detect the presence of either the 

defective CP gene or gene product. Assay kits for such 
screening and diagnosis can also be provided. The 
genetic information derived from the intron/exon 
boundaries is also very useful in various screening and 
10 diagnosis procedures. 

Patient therapy through supplementation with the 
normal gene product, whose production can be amplified 
using genetic and recombinant techniques, or its 
functional equivalent, is now also possible. Correction 
or modification of the defective gene product through 
drug treatment means is now possible, in addition 
cystic fibrosis can be cured or controlled through'gene 
therapy by correcting the gene defect An £i£ia or using 
recombinant or other vehicles to deliver a DNA sequence 
capable of expression of the normal gene product to the 
cells of the patient. 

According to another aspect of the invention, a 
purified mutant CP gene comprises a DNA sequence encoding 
an ammo acid sequence for a protein where the protein 
when expressed in cells of the human body, is associated 
with altered cell function which correlates with the 
genetic disease cystic fibrosis. 

According to another aspect of the invention, a 
purified RNA molecule comprises an Rna sequence 
corresponding to the above DNA sequence. 

According to another aspect of the invention, a DNA 
molecule comprises a cDNA molecule corresponding to the 
above DNA sequence. 

According to another aspect of the invention, a DNA 
molecule comprises a DNA sequence encoding mutant CFTR 
polypeptide having the sequence according to the 
following Figure l for amino acid residue positions i to 
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1480 as further characterized by a nucleotide sequence 
variants resulting in deletion or alteration of amino 
acids or residue positions 85, 148, 178, 455, 493, 507 
542, 549, 551, 560, 563, 574, 1077 and 1092. 
5 According to another aspect of the invention, a DNA 

molecule comprises an intronless DNA sequence encoding a 
mutant CFTR polypeptide having the sequenoe according to 
Figure 1 for DNA sequence positions i to 4575 and 
further characterized by nucleotide sequence variants 
10 resulting in deletion or alteration of DNA at DNA 

sequence positions 129, 556, 621+1, 711+1, 1717-1 and 

According to another aspect of the invention, a DNA 
molecule comprises a cDNA molecule corresponding to the 
15 above DNA sequence. 

cdna AC "° rding t0 another as P~* of the invention, the 
cDNA molecule comprises a DNA sequence selected from the 
group consisting of: 

20 dm& DNA se ^ noes ^ich correspond to the mutant 

positrr": SeleCted fr ° m ° f amino acid 

positions of 85, 148, 178, 455, 493, 507, 542 549 551 

S60 563, 574, 1077 and 1092 and mutant PNA seqnece ' 
positions 129, 556, 621+1, 711+1, 1717-1 and 36^9 and 
"^ -code, on expression, for mutant CFTR polypeptide; 

(b , DNA sequences which correspond to a fragment of 
the selected mutant DNA sequence, including at lelsT 
twenty nucleotides; 

(c) DNA sequences which comprise at least twenty 
JU CFTR P r °texn amino acid sequence; 

least ( !! h!^ Se<IUenCeS encodin * •» ^itope encoded by at 
least eighteen sequential nucleotides in the selected 
mutant DNA sequence. 

35 M Accordin * "to another aspect of the invention, a DNA 
35 sequence selected from the group consisting of : 
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(a) DNA sequences which correspond to portions of 
DNA sequences of boundaries of exons/introns of the 
genomic CF gene; 

(b) DNA sequences of at least eighteen sequential 
nucleotides at boundaries of exons/introns of the genomic 
CF gene depicted in Figure 18; and 

(c) DNA sequences of at least eighteen sequential 
nucleotides of intron portions of the genomic CF gene of 
Figure 18. 

According to another aspect of the invention, a 
purified nucleic acid probe comprises a DNA or RNA 
nucleotide sequence corresponding to the above noted 
selected DNA sequences of groups (a) to (c) . 

According to another aspect of the invention 
purified RNA molecule comprising RNA sequence corres- 
ponds to the mutant DNA sequence selected from the group 
of mutant protein positions consisting of 85, i 48 178 
455, 493, 507, 542, 549, 551, 560, 563, 574, 107 7 'and ' 
1092 and of mutant DNA sequence positions consisting of 
20 129, 556, 621+1, 711+1, 1717-1 and 3659. 

A purified nucleic acid probe comprising a DNA or 
RNA nucleotide sequence corresponding to the mutant 
sequences of the above recited group. 

According to another aspect of the invention, a 
recombinant cloning vector comprising the DNA sequences 
of the mutant DNA and fragments thereof selected from the 
group of mutant protein positions consisting of 85 148 
178 455, 493, 507, 542, 549, 551, 563, 574, 1077 and ' 
1092 and selected from the group of mutant DNA sequence 
positions consisting of 129, 556, 621+1, 711+1, 1717-1 
and 3659 is provided. The vector, according to an aspect 
of this invention, is operatively linked to an expression 
control sequence in the recombinant DNA molecule so that 
the selected mutant DNA sequences for the mutant CFTR 
polypeptide can be expressed. The expression control 
sequence is selected from the group consisting of 
sequences that control the expression of genes of 
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prokaryotic or eukaryotic cells and their viruses and 
combinations thereof. 

According to another aspect of the invention, a 
LTs1eps° r ofr UCin9 a ^ ^ptide comprises 

(a) culturing a host cell transfected with the 
recombinant vector for the mutant DNA sequence in a 
medium and under conditions favorable for expression of 
the mutant CPTR polypeptide selected from the group of 
mutant CFTR polypeptides at mutant protein positions 85, 

148, 178. 455, 493, 507, 542, 549, 551, 560, 563, 574 

1077 and 1092 and mutant dna sequence positions 129 556 
621+1, 7ii + i 1717-1 and 3659; and ' ' 

(b) isolating the expressed mutant CFTR 
15 polypeptide. 

Duxi«T r<U :!V 0 a " 0ther MPe0t ° f *»» ^vention, , 

an ammo .cid sequence encode by the mutant DNA 
sequences s.l.eta<J from th. group of mutant prot . in 
20 position, of 85, i <8 , 178 , 455 , 493< s P *« in 

560, 563 . S74 , 10 „ „ d 10 „ aM ^ ^ ^ 'J^ 

°«/z enc : h posltlons i2s - s5s - 1717-x z 

3659 whar. the protein, whan present in human cell 
membrane is associated with oeli function whioh causes 
the genetic disease cystic fibrosis. 

„.«, H° rding *° ,noth « <* the invention, . 

If T JT™~ *» acreening a subject to determine 
if the subject is . CF carrier or e CP petient comprising 
the steps of providing a biological sample of ^ . J£Z 
to be screwed and providing an assay for detecting H 

from the group consisting of: 

(a) mutant CF gene selected from the group of 
mutant protein positions 85, 148, 178, 455 
493, 507, 542, 549, 551, 560, 563, 574, 1077 
and 1092 and from the group of mutant DNA 
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sequence positions 129, 556, 621+1, 711+1, 
1717-1 and 3659; 

(b) mutant CF gene products and mixtures thereof; 

(c) DNA sequences which correspond to portions of 
5 DNA sequences of boundaries of exons/introns of the 

genomic CF gene; 

(d) DNA sequences of at least eighteen sequential 
nucleotides at boundaries of exons/introns of the genomic 
CF gene depicted in Figure 18; and 

10 (e) DNA sequences of at least eighteen sequential 

nucleotides of intron portions of the genomic CF gene of 
Figure 18. 

According to another aspect of the invention, a kit 
for assaying for the presence of a CF gene by immunoassay 
15 techniques comprises: 

(a) an antibody which specifically binds to a gene 
product of the mutant DNA sequence selected from the 
group of mutant protein positions 85, 148, 178, 455, 493, 
507, 542, 549, 551, 560, 563, 574, 1077 and 1092 and from 

20 the group of mutant DNA sequence positions 129, 556, 
621+1, 711+1, 1717-1 and 3659; 

(b) reagent means for detecting the binding of the 
antibody to the gene product; and 

(c) the antibody and reagent means each being 

25 present in amounts effective to perform the immunoassay. 

According to another aspect of the invention, a kit 
for assaying for the presence of a mutant CF gene by 
hybridization technique comprises: 

(a) an oligonucleotide probe which specifically 
30 binds to the mutant CF gene having a mutation at a 

protein position selected from the group consisting of 
85, 148, 178, 455, 493, 507, 542, 549, 551, 560, 563, 
574, 1077 and 1092 or having a mutation at a DNA sequence 
position selected from the group consisting of 129, 556, 
35 621+1, 711+1, 1717-1 and 3659; 

(b) reagent means for detecting the hybridization 
of the oligonucleotide probe to the mutant CF gene; and 
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(c) the probe and reagent means each being present 
in amounts effective to perform the hybrid izat L n I SS a y 

According to another aspect of the i nvo „M 
animal comprises an heterologous cllT 'Z " 
5 system includes a recombinant cloning ZlTr^Zl ^ 
includes the recombinant DNA sequence corresp^dil , 
the mutant DNA sequence which induces cystL ^roL 
symptoms in the animal. fibrosis 

According to another aspect of the i nv »«^ 
10 polymerase chain reaction to in * 
CDNA sequence of Figure 1 the !L L * electe « «*°n of a 
timers from i^Z^J?£ Z^T^ 
boundaries of the selected exon of Figure is 
***** OKBQKTPTiov o r m r rnirTTTfffI 

Figure 1 is the nucleotide sequence of the CF oen* 
and the amino acid sequence of the CFTR oroJi , 
acid sequence with a indicating mutatTnfaTthe^ 0 
508 protein positions. 507 and 



20 



Figure 2 is a restriction map of the cp „ 
schematic strategy used t« eh , 9Sne and the 

the gene. chromosome walk and jump to 



Figure 3 depicts the physical map of the re»i«„ 
including and surrounding the CF «„» * " 

fieirt , gene generated by pulsed 

field gen electrophoresis. Panels A b e » „ / 

25 hybridization data for the restriction 1 ^ 

*r _ ^ restriction enzymes Sal I yh« 

I. Sti I, and H,e I, respectively ge„er,t.rby 
representative genomic and cdha probes whil 
region. To. deduced pnysioai naps^t .^J^ 
enzyme is shown below each Mn «i * restriction 
30 entire »„, interva^/^ , " ^ 

Ro»e„s et .1., ». Hua . 45 932-,^ , 

The open bced ..„.„ t lndicatM th l J^Z ^ I 
onr.„osos. 1Jcing ana juaping _ Md £ 
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Figure 4D is a restriction map of overlapping 
segments of probes E4.3 and Hi. 6. 

Figure 5 is an RNA blot hybridization analysis using 
genomic and cdna probes. Hybridization to RNA of A - 
5 fibroblast with cDNA probe G-2; B-trachea (from 

unafflicted and CF patient individuals), pancreas, liver 
HL60 cell line and brain with genomic probe CF16; C-T84 
cell line with cDNA probe 10-1. 

Figure 6 is the methylation status of the E4 3 
10 cloned region at the 5' end of the CF gene. 

Figure 7 is a restriction map of the CFTR cDNA 
showing alignment of the cDNA to the genomic DNA 
fragments. 

Figure 8 is an RNA gel blot analysis depicting 
15 hybridization by a portion of the CFTR cDNA (clone io-l> 
to a 6.5 kb mRNA transcript in various human tissues. 

Figure 9 is a DNA blot hybridization analysis 
depicting hybridization by the CFTR cDNA clones to 
genomic DNA digested with EcoRi and Hind in 
20 Figure io is a primer extension experiment 

characterizing the 5' and 3' ends of the CFTR cDNA. 

Figure 11 is a hydropathy profile and shows 
predicted secondary structures of CFTR. 

Figure 12 is a dot matrix analysis of internal 
homologies in the predicted CFTR polypeptide. 

protein^ " *" * ° f the ******* CFTR 

Figure 14 is a schematic diagram of the restriction 
fragment length polymorphisms (RFLP's) closely linked to 
the CF gene where the inverted triangle indicates the 
locatxon of the F508 3 base pair deletion. 

segment!^ It ********** ali *"* ent <* the most conserved 
segments of the extended NBFs of cftr with comparable 
regions of other proteins. 

Figure 16 is the DNA sequence around the F508 
deletion. 
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Figure 17 is a representation of the nucleotide 
sequencing gel showing the DNA sequence at the F508 
deletion. 

Figure 18 is the nucleotide sequence of the portions 
of introns and complete exons of the genomic CF gene for 
27 exons identified and numbered sequentially as l 
through 24 with additional exons 6a, 6b, 14a, 14b and 
17a, 17b of cDNA sequence of Figure 1; 

Figure 19 shows the results of amplification of 
genomic DNA using intron oligonucleotides bounding exon 
10; 

Figure 20 shows the separation by gel 
electrophoresis of the amplified genomic DNA products of 
a CF family; and 

Figure 21 is a restriction mapping of cloned intron 
and exon portions of genomic DNA which introns and exons 
are identified in Figure 18. 

PBTAIIrEP PBgfflTPTTQN OP THK PWEPBrrbp 1^0^^- 

1*. DEFHrrfTQIM 

in order to facilitate review of the various 
embodiments of the invention and an understanding of 
various elements and constituents used in making the 
invention and using same, the following definition of 
terms used in the invention description is as follows: 
25 CF - cystic fibrosis 

CF carrier - a person in apparent health whose 
chromosomes contain a mutant CF gene that may be 
transmitted to that person's offspring. 

CF patient - a person who carries a mutant CF gene 
on each chromosome, such that they exhibit the clinical 
symptoms of cystic fibrosis. 

CF gene - the gene whose mutant forms are associated 
with the disease cystic fibrosis. This definition is 
understood to include the various sequence polymorphisms 
that exist, wherein nucleotide substitutions in the gene 
sequence do not affect, the essential function of the gene 
product. This term primarily relates to an isolated 
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coding sequence, but can also include some or all of the 
flanking regulatory elements and/ or introns. 

Genomic CF gene - the CF gene which includes 
flanking regulatory elements and/or introns at boundaries 
of exons of the CF gene. 

CF - PI - cystic fibrosis pancreatic insufficient, 
the major clinical subgroup of cystic fibrosis patients, 
characterized by insufficient pancreatic exocrine 
function. 

CF - PS - cystic fibrosis pancreatic sufficient, a 
clinical subgroup of cystic fibrosis patients with 
sufficient pancreatic exocrine function for normal 
digestion of food. 

CFTR - cystic fibrosis transmembrane conductance 
15 regulator protein, encoded by the CF gene. This 

definition includes the protein as isolated from human or 
animal sources, as produced by recombinant organisms, and 
as chemically or enzymatically synthesized. This 
definition is understood to include the various 
polymorphic forms of the protein wherein amino acid 
substitutions in the variable regions of the sequence 
does not affect the essential functioning of the protein 
or its hydropathic profile or secondary or tertiary 
structure. 

DNA - standard nomenclature is used to identify the 
bases. 

intronless DNA - a piece of DNA lacking internal 
non-coding segments, for example, cDNA. 

IRP locus sequence - (protooncogene int-l related) , 
30 a gene located near the CF gene. 

Mutant CFTR - a protein that is highly analagous to 
CFTR xn terms of primary, secondary, and tertiary 
structure, but wherein a small number of amino acid 
substitutions and/or deletions and/or insertions result 
in impairment of its essential function, so that 
organisms whose epithelial cells express mutant CFTR 



20 



25 



35 



WO 91/10734 



PCT/CA91/00009 



30 



35 



15 

rather than CPTR demonstrate the symptoms of cystic 
fibrosis • 

mCP - a mouse gen* orthologous to the human CP gene 
NBPs - nucleotide (ATP) binding folds 
5 ORF - open reading frame 

PCR - polymerase chain reaction 
Protein - standard single letter nomenclature is 
used to identify the amino acids 

io the c™°;:ti:: hi9hly char9ed cytopiMic - 

RSV - Rous Sarcoma Virus 
SAP - surfactant protein 

WLP - restriction fragnent length polymorphism 
15 DMA . ""J"* CF - the CP gene which includes a 

S07 mutant CFTH protein or sutant cpth .»•„,■.< 

20 «*» "eguence, or sutant cftr oolypaCtioT^ H " m ™ 
cfto *- ^"xypepciae - the mutant 

CPTR protein wherein an amino acid deletion occurs at the 
isoleucine 5 06 or 507 protein position of the C^R 

Protein position means amino acid residue lotion 
ISOLATTKn m ff TO iaue P^ition. 

5 v. u a!! 1 " 5 ChroiB080iae diking, jumping, and cDNA 

appUcetions. For purposes of convenience in 
understand and isolating the cr gene and identifying 

residue * I« S "' " 77 « 1092 »*»° »=" 

residue positions, the technics is reiterated here 

Several transcribed sequences and „„ ! 

been identified in thlTr^g on 0n ToTL*e "~ 

9 ° na of these corresponds ' 
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to the CF gene and spans approximately 250 kb of genomic 
DNA. Overlapping complementary DNA (cDNA) clones have 
been isolated from epithelial cell libraries with a 
genomic DNA segment containing a portion of the cystic 
fibrosis gene. The nucleotide sequence of the isolated 
cDNA is shown in Figures l through 18. In each row of 
the respective sequences the lower row is a list by 
standard nomenclature of the nucleotide sequence. The 
upper row in each respective row of sequences is standard 
single letter nomenclature for the amino acid 
corresponding to the respective codon. 

Accordingly, the isolation of the CF gene provided a 
cDNA molecule comprising a DNA sequence selected from the 
group consisting of: 

(a) DNA sequences which correspond to the DNA 
sequence of Figure 1 from amino acid residue position l 
to position 1480; 

(b) DNA sequences encoding normal CFTR polypeptide 
having the sequence according to Figure 1 for amino acid 
residue positions from l to 1480; 

(c) DNA sequences which correspond to a fragment of 
the sequence of Figure 1 including at least 16 sequential 
nucleotides between amino acid residue positions 1 and 
1480; 

(d) DNA sequences which comprise at least 16 
nucleotides and encode a fragment of the amino acid 
sequence of Figure 1; and 

(e) DNA sequences encoding an epitope encoded by at 
least 18 sequential nucleotides in the sequence of Figure 
1 between amino acid residue positions 1 and 1480. 

According to this invention, the isolation of other 
mutations in the CF gene also provides a cDNA molecule 
comprising a DNA sequence selected from the group 
consisting of: 

a) DNA sequences which correspond to the DNA 
sequence encoding mutant CFTR polypeptide characterized 
by cystic f ibrosis-associated activity in human 
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epithelial cells, or the DNA sequence of Figure 1 for the 
ammo acid residue positions i to 1480 yet further 
characterized by a base pair mutation which results in 

5 T^ 1 ^ ° f ° r 3 Chan9e f ° r an amin ° acid a * "si«ue 
5 positions 85, 148, 178, 455, 493, 507, 542, 549, 551 

560, 563, 574, 1077 and 1092; 

b) DNA sequences which correspond to fragments of 

whirr! r tl0n ° f ^ 8eqUenCe ° f *~*^> and 
which include at least sixteen nucleotides; 

10 c) DNA sequences which comprise at least sixteen 

nucleotides and encode a fragment of the amino acid 
sequence encoded for by the mutant portion of the DNA 
sequence of paragraph a) ; and 

15 2 as /!« 8egUenCes enco *ing an epitope encoded by at 

the s SSqUential —^tides in the mutant portion of 
the sequence of the DNA of paragraph a) . 

Transcripts of approximately 6,500 nucleotides in 
size are detectable in tissues affected in patients with 
2 o „ , UP ° n *"* iaolated nucleotide sequence, the 

contalT Pr ° tein C ° nSi8tS ° f tW ° Simil « ^ions,^a C h 
containing a first domain having properties consistent 
with membrane association and a second domain believed to 
be involved in ATP binding. relieved to 

A 3 bp deletion which results in the omission of a 
25 phenylalanine residue at the center of the first 

508 ed orSe n cp le ° tide Mnding dOMin (amin ° acid P-ition 
508 of the CP gene product) was detected in CP patients 

This station in the normal DNA sequence of Pig^e 1 

corresponds to approximately 70% of the mutations in 

30 cystic fibrosis patients. Extended haplotype lata base, 

on DNA makers closely linxed to the putative disease 

gene suggest that the remainder of the CP mutant gene 

Pool consists of multiple, different mutations. Th" is 

.5 50°! 0 T^7 lified ^ ^ inVenti ° n example t ne 

506 or 507 protein position, a small set of these latter 
*utant alleles (approximately 8%) may confer resLuaT 



WO 91/10734 



PCT/CA91/00009 



18 

pancreatic exocrine function in a subgroup of patients 

who are pancreatic sufficient. 

ZjA CHROMOSOME walkino amp jpmptwo 

Large amounts of the DNA surrounding the D7S122 and 
5 D75340 linkage regions of Rommens et al supra were 

searched for candidate gene sequences. In addition to 
conventional chromosome walking methods, chromosome 
jumping techniques were employed to accelerate the search 
process. From each jump endpoint a new bidirectional 
10 walk could be initiated. Sequential walks halted by 

"unclonable" regions often encountered in the mammalian 
genome could be circumvented by chromosome jumping. 

The chromosome jumping library used has been 
described previously [Collins et al, Science 235, 1046 
15 (1987); lanuzzi et al, Am. J. Hum, f^t. 44 , 695 
(1989)]. The original library was prepared from a 
preparative pulsed field gel, and was intended to contain 
partial EcoRl fragments of 70 - 130 kb; subsequent 
experience with this library indicates that smaller 
20 fragments were also represented, and jumpsizes of 25 - 
110 kb have been found. The library was plated on sup- 
host MC1061 and screened by standard techniques, 
[Maniatis et al]. Positive clones were subcloned into 
pBRA23Ava and the beginning and end of the jump 
25 identified by EcoRl and Ava 1 digestion, as described in 
Collins, genPWe anqlVFl* : A pr ac tH« a l anproa^ (irl, 
London, 1988) , pp. 73-94) . For each clone, a fragment 
from the end of the jump was checked to confirm its 
location on chromosome 7. The contiguous chromosome 
30 region covered by chromosome walking and jumping was 
about 250 kb. Direction of the jumps was biased by 
careful choice of probes, as described by Collins et al 
and lanuzzi et al, sapxa. The entire region cloned, 
including the sequences isolated with the use of the CF 
35 gene cDNA, is approximately 500 kb. 

The schematic representation of the chromosome 
walking and jumping strategy is illustrated in Figure 2. 
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CP gene exons are indicate by Roman nuasrals in thi. 
Pigure. Horizontal line, .bov. 

steps whersas th. ares abova th.^ D TJ " Walk 
The Pigure proceeds froB Zlt ^ "out ,U * " epS - 

• tlsrs with the auction o£ ^ to^ a "ceT f 

indicted. The restriction »p t 1 IT " 
HindXXX. and BasHX ia she™ 1^^' 
spanning the entire cloned region. Restrill , 
indiceted with arrow, rather Ln vesica! m T" 
XO site. which have not been uneguivocal!'" ""^ 
Additional restriction sites Tor 121 PMitioMa - 
below the line Gan. < !! " anZ " M " i "» s »own 

^ II- These "ocur^ ; iTthe° Md J e9i0n ~ 
clones o t the CP trantript". These ^ps * 
15 «— on pulsed tUl^iZZ™™** 1 * * 

The walking clones, as indicated by horLont\ 91 °"- 
•hove the sap, have the direction L T mOVB 
the walking process obtains X 

clones begin wlth „. i.ttar c ^."^ Cl ° na - <*»»" 

Qte line location of exons of +-k« 
horizontal bcxss shown above the l L Z ^ *"* 
2= during the experiments. ThreTcX £ T V 

independent subcloninc of P "^.sent 

r detect poi^rprLttn^orn^?.^"- 
XT ^ r:r iu - 

30 end prob/^ ^^Tt ^ ^szlv V"*"' ' 

ox E6 which detects a transcri^d " Subf "^"t 

and Paso are synthetic oT 8161, Rls9 , 

P«ts of the I^ocu! ° U9 ° nUele ° ti < 1 « constructed fro* 

- *muu. T^rrZZlT J ; h Bainwrl9ht « «. 

transcript on the genes c »p ^ l0Cati ° n ° f 0,18 
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As the two independently isolated DNA markers, 
D7S122 (PH131) and D7S340 (TM58) , were only 
approximately 10 kb apart (Figure 2) , the walks and jumps 
were essentially initiated from a single point. The 
5 direction of walking and jumping with respect to MET and 
D7S8 was then established with the crossing of several 
rare-cutting restriction endonuclease recognition sites 
(such as those for Xho I, Nru I and Not I, see Figure 2) 
and with reference to the long range physical map of J. 
10 M. Rommens et al. Am. J. h.™. Ssnstk*., in press; A. M. 
Poustka, et al, genomics 2, 337 (1988); M. L. Drumm et 
al. Genomics 2, 346 (1988). The pulsed field mapping 
data also revealed that the Not I site identified by the 
inventors of the present invention (see Figure 2, 
15 position 113 kb) corresponded to the one previously found 
associated with the IRP locus (Estivill et al 1987, 
supra) . since subsequent genetic studies showed that CF 
was most likely located between IRP and D7S8 [M. Farrall 
et a1 ' Am> J ' fflffllt SgPSi 43, 471 (1988), B.S. Kerem et 
20 a1 ' HVffl, <*>n»t . 44, 827 (1989)), the walking and 

jumping effort was continued exclusively towards cloning 
of this interval, it is appreciated, however, that other 
coding regions, as identified in Figure 2, for example, 
G-2, CF14 and CF16, were located and extensively 
25 investigated. Such extensive investigations of these 
other regions revealed that they were not the CF gene 
based on genetic data and sequence analysis. Given the 
lack of knowledge of the location of the CF gene and its 
characteristics, the extensive and time consuming 
30 examination of the nearby presumptive coding regions did 
not advance the direction of search for the CF gene. 
However, these investigations were necessary in order to 
rule out the possibility of the CF gene being in those 
regions. 

'5 Three regions in the 280 kb segment were found not 

to be readily recoverable in the amplified genomic 
libraries initially used. These less clonable regions 
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were located near the DNA segsent. H2.3A and X 6 and 
>ust beyond oos,id cw„, at position, 75-100 2 X 22S 
*b, and 27S-28S Kb i„ Figure 2 , „ specti ^ 
recombinant clones near H2.3A were found t " ' 
S unstable with dranatic rearrangesen^rter 

passages of baoteri.i culture. T^u L ° * " 
gaps, primary walhin, libraries I . " sulti "» 

special host-vector systems Tt TV COnstructea 
allow propagation L^ST ^cest" ."7" * 
" »• MO "" »• Botstein, Proo. -e*^ * £ '» »»- *• 
2880 (1985); K. F. Wertman, A R w™.„ „ " ' "** 

...oribTL c ^.T": oo ::r ct8a — ~« 

20 ~ '--™rrSoTc I, 

spring Harbor, New York !9B2l .„„ ""oratory, cold 
This inclodes eight phage UbrarteT " ™° U 

Provided by T. ManiaL CFritocn Z ™ C .HT 
(«.o, , ; the rest were construct da. ^ 

«■». Four phage libraries were cloned in xda! 
(commercially available from strata*.,,.? \ 

available Tj^^^ 

0 U^ary- ~ 2 ZIT— 

CKA from a hu^-t^er'^ ^n^T «~ 
chromosome 7 ,4AF/102/K0is[ r Romans « a! ^ » 

unstable seguences tlTZ T T ° aVold loss <* 

Propagated on the ^^^^1^^: Zl< 
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10 



15 



20 



(rec D ->, CES 200 (recBC-> (Wyman et al, , Wertman et 

al aiEta, wyman et al aiEca]; or TAP90 [Patterson et al 
Nucleic Acids p f q , 15 :6298 (1987)]. Three cosmid 
libraries were then constructed, m one the victor 
PCV108 [Lau et al Proc. ^ ^ r - f ^ 8()jS225 
(1983,] was used to clone partially digested (Sau 3A, DNA 

(1988)]. A second cosmid library was prepared by clonina 
partially digested (Mbo x, human lymphoblastoid oJ ll 
the vector P WE-IL2R, prepared by inserting the RSV (Rous 
Sarcoma Virus, promoter-driven cDNA for the interleukin-2 

llllll l" Chain (SUPPliSd by Pord±S and B ' Howard, in 
Place of the neo-resistance gene of P WE15 [Wahl et al 

PrQCt Nat3- Acad, gSiu TTfifl 84;2160 (1987,,. An 
additional partial Mbo 1 cosmid library was prepared in 
the vector pWE-IL2-Sal, created by inserting a Sal I 
linker into the Bam HI cloning site of pWE-EL2R (M 
Drumm, unpublished data, ; this allows the use of the 
partial f in- in technique to ligate Sal I and x ends 
preventing tandem insertions CZabarovsKy et al fi^ Il ls 
(1986, 3. cosmid libraries were propagated in ^1 
host strains DHi or 490A [M. Steinmetz, a. Winoto,T 
Minard, l. Hood, Qsll 28, 489(1982,3. 
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TABLE 1 
GENOMTg f,TftBABirp 

Source of hum^T, Pm 



1 -a- - jo. 

liver DNA (amplified) 



10 



15 



20 



25 



30 



PCV108 
Adash 
Adash 

Adash 

Adash 

AFIX 
AFIX 
AFIX 



SaU3a n^ ially Rested DKl 3 x 10< 

DNA from 4AF/K015 (amplifJed) 

SaU3A nS ar 5 ially di 9ested LE392 X x 10 « 

from 4Ap / K0 " (amplified? 



35 



40 



45 



50 



Sau3A-partially digested DB1316 
total human peripheral 
blood DNA 

BamHI-digested total DB1316 
human peripheral blood 

UNA 

EC ° RI 2 a ^ lally dl ^sted DB1316 

Mbol-partially digested LE392 
human lymphoblastold DNA 

Mbol-partially digested CE200 
human lymphoblastold dna 

Mb ° I "h a r ia J ly di 9«sted TAP90 
human lymphoblastold DNA 

PWE-IL2R Mbol-partially digested 490A 
ftuman lymphoblastoid DNA 

(Japing) human lymphoblastold DBA 
Iannuzzi 



1.5 x io 6 
1.5 x io 6 
8 x io 6 

1.5 x io 6 

1.2 x io 6 

1.3 x 10 6 

5 X 10 5 
1.2 x 10 6 
3 x 10 6 



Lawn 
et al 
1980 



supra 

et al 
supra 
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Three of the phage libraries were propagated and 
amplified in ss21 bacterial strain LE392. Four 
subsequent libraries were plated on the recombination- 
deficient hosts DB1316 (recD> or CES200 (rec B<T> [Wyman 
5 1985, supra; Wertman 1986, supra; and Wyman 1986, supra ) 
or in one case TAP90 [T.A. Patterson and H. Dean, Nucleic 
Acids Reseat 1 * , 6 298 (1987)]. 

Single copy DNA segments (free of repetitive 
elements) near the ends of each phage or cosmid insert 
10 were purified and used as probes for library screening to 
isolate overlapping DNA fragments by standard procedures. 
(Maniatis, et al, supra 1 . 

1-2 x 10* phage clones were plated on 25-30 150 mm 
Petri dishes with the appropriate indicator bacterial 
15 host and incubated at 37»c for 10-16 hr. Duplicate 

"lifts" were prepared for each plate with nitrocellulose 
or nylon membranes, prehybridized and hybridized under 
conditions described (Rommens et al, 1988, supra ) . 
Probes were labelled with »P to a specific activity of >5 
x 10' cpm//ig using the random priming procedure (A. P. 
Feinberg and B. Vogelstein, Anal. Biocho™ . 132 , 6 
(1983)). The cosmid library was spread on ampicillin- 
containing plates and screened in a similar manner. 

DNA probes which gave high background signals could 
often be used more successfully by preannealing the 
boiled probe with 250 ng/ml sheared denatured placental 
DNA for 60 minutes prior to adding the probe to the 
hybridization bag. 

For each walk step, the identity of the cloned DNA 
fragment was determined by hybridization with a somatic 
cell hybrid panel to confirm its chromosomal location, 
and by restriction mapping and Southern blot analysis to 
confirm its colinearity with the genome. 

The total combined cloned region of the genomic DNA 
sequences isolated and the overlapping cDNA clones, 
extended >500 kb. To ensure that the DNA segments 
isolated by the chromosome walking and jumping procedures 
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were colinear with the genomic sequence « 

examined by: sequence, each segment was 

hybrid^ ^ ±di2ation anal * 8is 1th human-rodent somatic 
hybrid cell lines to confirm chromosome 7 localization 
(b) pulsed field gel electrophoresis, and 

DNA to C LT7* 8 ° n ° f r&8t ^n map of the cloned 
to that of the genomic DNA. 

Accordingly, single copy hnman DIIi sequenMS 

isoleted from each recombinant pheg. and colli !, 

Hhile the mejority of phege end coemid ieoletes 

fro. clon ln g ertifecte or crose-hybridizin, eeguences 

hZ t r * gi ° nS ln «» - or froTtte 

hameter genome in case, where the iibreries were derived 

™t h ^::L«r r hybrid c,u une - - 

correct localization was particularly important for 
clones isolated by chromosome jumping Manv^lf , 
«• were considered and resulted in nLc'ncLLe^ ^ 

floret*?* 111 ' ^ direCtl ° n ° f ligation away 

" gWWMtmim of BBfilSIgHfil! MM 

Further confirmation of the overall Bh „ a< , 

;:r: :: :: P 3 rr: with the u ~ « — «« 
- sr:x ^f^^ 

"■^mm et el, 1988 ansa). L> 
30 Ion. Fl9Ur,,S '* t0 3E illust «tee the findinge of the 

rrom the human-hamster ceil 

ceil line 4AF/102/K015 was 
digested with the enzymes ( A ) Sal t /r /I V 
3= «»d (D) H .e X, eeperet" .Zll JkZ Z " ' 

y oxot was sequentially hybridized 
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with the probes indicated below each of the panels of 
Figure A to D, with stripping of the blot between 
hybridizations. The symbols for each enzyme of Figure 3E 
are: A, Nae I; B, Bss HI1; F. S fi I; L, Sal I; M , Mlu I- 
5 N, Not I; R, Nru I; and X, xho 1. c corresponds to the' 
compression zone region of the gel. DNA preparations 
restriction digestion, and crossed field gel 
electrophoresis methods have been described (Rommens et 
al, in press, supra) . The gels in Figure 3 were run in 
10 0.5X TBE at 7 volts/cm for 20 hours with switching 

Unearly ramped from 10-40 seconds for (A), (B ) , and (C) 
and at 8 volts/cm for 20 hours with switching ramped ' 
linearly from 50-150 seconds for (D) . Schematic 
interpretations of the hybridization pattern are given 
15 below each panel. Fragment lengths are in kilobases and 
were sized by comparison to oligomerized bacteriophage 
ADNA and gagcharpmvces cermH*^ chromosomes. * 

H4.0, J44, EG1.4 are genomic probes generated from 
the walking and jumping experiments (see Figure 2). J30 
20 has been isolated by four consecutive jumps from D7S8 
(Collins et al, 1987, gy^; Ianuzzi et al# ^ 

Dean ' et a1 ' submitted for publication). 10-1, b 75 
and CE1. 5/1.0 are cDNA probes which cover different * ' 

25 vr 9i H R ° f ^ ° F tMMOffl * t « stains axons I - 

25 VI, B.75 contains axons v - XII, and CEl.5/1.0 contains 
exons XII- XXIV. Shown in Figure 3E is a composite map 

1L \ tl B " 0788 interVal * *~ ***** region 

indicates the segment cloned by walking and jumping, and 

30 ^! tlT ~T POrti ° n ±ndicates <*• "gion covered by 
Seo^rr ^ associated with 

Not I sit T S (EStiVl11 ^ 198? ' SUpra > is at the 
Not I site shown xn parentheses. This and other sites 

shown in parentheses or square brackets do not cut in 

3s ITZT' but have been observed in human 
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Based on the findings of long range restricts 

5 aL . C ° ntained on a 3 «° » sal I fragment. 

fielT? ° f ^ rSStriCti0n 8it - derived^ pulsed 
field gel analysis to those identified in «, !?7 

the CF gene wes approxinately 250 kb 

0 to alto th" inf0rnatlV ' '-trictto en 2yKe thet eervea 
to align the -ap of th. c i onea DNA fragments and th. , 
range restriction sap was xho I- .u ofT . 
identified with the reconbinan 'dn "JLf *° 1 Sites 
aueceptihle to ,t least partial oto»oe *° 
(compare saps in ri^e. ! J " Sa "° nlC °" A 

1 r:r:::t anaiysis with -sets-*. 

end of the CP gene identified 2 Sfil Bl f ao „ ' 
the position of .„ anticipated Z l sit. "* ""'"^ 
Those findings further supported th. ™ , 

Lx n ie prooedure8 — c ° iiM « — «- ~. and 

gMTBHra FOB TnemTyT^^ 

A positiv. result based on one or more or .„ 
following criteria suggested that a cloned DNA ♦ 
-ay contain oxidate gen. sequences: 

(a) detection of cross-hybridizing sequences in 

it'jzz. ( " Mny — — ~„'«r s in 
th. 5 ^ M r^°: - jTr^rir- - 

(d) isolation of corresponding cDNA sequences 

(e) identification of open reading ^ quences ' 
sequencing of cloned DNA segLnts * * 
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cross-species hybridization showed strong sequence 
conservation between human and bovine DNA when CF14, E4.3 
and Hi. 6 were used as probes, the results of which are 
shown in Figures 4 A, 4B and 4C. 

Human, bovine, mouse, hamster, and chicken genomic 
DNAs were digested with Eco RI (R) , Hind III (H ) , and Pst 
I (P) , electrophoresed, and blotted to Zetabind" 
(BioRad) . The hybridization procedures of Rommens et al, 
1988, sypra , were used with the most stringent wash at 
55 «c, 0.2X ssc, and 0.1% sds. The probes used for 
hybridization, in Figure 4, included: (A) entire cosmid 
CF14, (B) E4.3, (C) HI. 6. In the schematic of Figure 
(D) , the shaded region indicates the area of cross- 
species conservation. 
15 The fact that different subsets of bands were 

detected in bovine DNA with these two overlapping DNA 
segments (HI. 6 and E4.3) suggested that the conserved 
sequences were located at the boundaries of the 
overlapped region (Figure 4(D)). when these DNA segments 
20 were used to detect RNA transcripts from a variety of 
tissues, no hybridization signal was detected, in an 
attempt to understand the cross-hybridizing region and to 
identify possible open reading frames, the DNA sequences 
of the entire HI. 6 and part of the E4.3 fragment were 
25 determined. The results showed that, except for a long 
stretch of CG-rich sequence containing the recognition 
sites for two restriction enzymes (Bss H1I and Sac II) 
often found associated with undermethylated CpG islands 
there were only short open reading frames which could not 
easily explain the strong cross-species hybridization 
signals. 

To examine the methylation status of this highly 
CpG-rich region revealed by sequencing, genomic DNA 
samples prepared from fibroblasts and lymphoblasts were 
digested with the restriction enzymes Hpa II and Msp i 
and analyzed by gel blot hybridization. The enzyme Hpa 
II cuts the DNA sequence 5'-ccGG-3' only when the second 
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cytosine is unmsthylated. whereas Msp i cuts this 
"rr S re l" al " B ot the «* -thylation. small 

that t h T^ WSra 9e "* r " ad by b0th "» »— • lectin, 
that thxs cps-nch region is indeed undermethylated i„ 
5 genomic DMA. The gel-blot hybridization with the E4 3 
segment (Figure 6, reveals very small hybridizing 
fragments with both enzymes, indicating the presence or a 
hypomethylated CpG island. 

The above results strongly suggest the presence of a 

hi Twm?: r ttis ioous - - dh * ~ ^ -V 

from tn! cross-species hybridization signals 

libraries mad. from several tissue, ana cell types. 

cdna libraries from cultured epithelial cells were 

nL r? :? f° UOKS - — * »"» <«ived from 7 

non-CP individual and from a CP patient were grown to 
first pass.,. ,s described [G. collie et al, m vlt™ 
^ ««.».]. The presence^" 

T^L ; TabCha " ni - T - J - *««». J.S. Riord™, j.„ 
~° n ; J " Hmfr - *» P«ss, but the CF cells were 

l w Ha":! tQ ! etiV "" 0 " cyclic AMP (T.J. Jensen 
J.w H.nrahan, j.a. Tabcharani, M. Buchwald and J R 

::i:«ed fi rt^„r; ---^ 

synthesis of cDMA with olioo fd<m i, , a 7 

metnyiase end ends were made flush with T4 dha 
polymerase. Phosphorylated Eco Rl UnHers were ligated 
the cdna end restricted with Eco Rl. Removal of 
35 excess linkers ana partial size fractionation wss 

achieved by Biogel A- 50 chromatography. The cONAs were 
then ugated into the Eco Rl site of the commercial^ 
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available lamdba ZAP. Recombinant were packaged and 
propagated in E t <?<?U BB4. Portions of the packaging 
mixes were amplified and the remainder retained for 
screening prior to amplification. The same procedures 
5 were used to construct a library from RNA isolated from 
preconfluent cultures of the T-84 colonic carcinoma cell 
line (Dharmsathaphorn, k. et al. Am. J. PHyHrl 246/ 
G204, 1984). The numbers of independent recombinant in 
the three libraries were: 2 x 10« for the non-CF sweat 
10 gland cells, 4.5 x 10* for the CP sweat gland cells and 
3.2 x I0 tf from T-84 cells. These phages were plated at 
50,000 per 15 cm plate and plaque lifts made using nylon 
membranes (Biodyne) and probed with DNA fragments 
labelled with 32 P using DNA polymerase I and a random 
15 mixture of oligonucleotides as primer. Hybridization 
conditions were according to G.M. Wahl and S.L. Berger 
( Metht Enzyisal . 152,415, 1987). Bluescript™ plasmids 
were rescued from plaque purified clones by excision with 
M13 helper phage. The lung and pancreas libraries were 
20 purchased from Clontech Lab Inc. with reported sizes of 
1.4 x 10 6 and 1.7 x 10 6 independent clones. 

After screening 7 different libraries each 
containing l x 10' - 5 x 10« independent clones, 1 single 
clone (identified as lo-l) was isolated with H1.6 from a 
25 cDNA library made from the cultured sweat gland 

epithelial cells of an unaffected (non-CF) individual. 

DNA sequencing analysis showed that probe 10-1 
contained an insert of 920 bp in size and one potential 
long open reading frame (ORF) . since one end of the 
30 sequence shared perfect sequence identity with HI. 6 it 
was concluded that the cDNA clone was probably derived 
from this region. The DNA sequence in common was, 
however, only 113 bp long (see Figures 1 and 7) . ' As 
detailed below, this sequence in fact corresponded to the 
35 5'-most exon of the putative CF gene. The short sequence 
overlap thus explained the weak hybridization signals in 
library screening and inability to detect transcripts in " 



WO 91/10734 



PCT/CA91/00009 



31 



10 



15 



RNA gel-blot analysis. In addition, the orientation of 
the transcription unit was tentatively established on the 
basis of alignment of the genomic DNA sequence with the 
presumptive ORF of 10-1. 

Since the corresponding transcript was estimated to 
be approximately 6500 nucleotides in length by RNA gel- 
blot hybridization experiments, further cDNA library 
screening was required in order to clone the remainder of 
the coding region. As a result of several successive 
screenings with cDNA libraries generated from the colonic 
carcinoma cell line T84, normal and CF sweat gland cells, 
pancreas and adult lungs, 18 additional clones were 
isolated (Figure 7, as subsequently discussed in greater 
detail) . DNA sequence analysis revealed that none of 
these cDNA clones corresponded to the length of the 
observed transcript, but it was possible to derive a 
consensus sequence based on overlapping regions. 
Additional cDNA clones corresponding to the 5' and 3' 
ends of the transcript were derived from 5' and 3' 
20 primer-extension experiments. Together, these clones 
span a total of about 6.1 kb and contain an ORF capable 
of encoding a polypeptide of 1480 amino acid residues 
(Figure l) . 

It was unusual to observe that most of the cDNA 
25 clones isolated here contained sequence insertions at 
various locations of the restriction map of Figure 7. 
The map details the genomic structure of the CF gene. 
Exon/intron boundaries are given Where all cDNA clones 
isolated are schematically represented on the upper half 
30 of the figure. Many of these extra sequences clearly 
corresponded to intron regions reversely transcribed 
during the construction of the cDNA, as revealed upon 
alignment with genomic DNA sequences. 

Since the number of recombinant cDNA clones for the 
35 CF gene detected in the library screening was much less 
than would have been expected from the abundance of 
transcript estimated from RNA hybridization experiments, 
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it seemed probable that the clones that contained 
aberrant structures were preferentially retained while 
the proper clones were lost during propagation, 
consistent with this interpretation, poor growth was 
5 observed for the majority of the recombinant clones 
isolated in this study, regardless of the vector used. 

The procedures used to obtain the 5' and 3' ends of 
the cDNA were similar to those described (M. Prohman et 
a1 ' prQC ' Watt Acaflt ffnl , USA, 85, 8998-9002, 1988). For 
10 the 5' end clones, total pancreas and T84 poly A + RNA 
samples were reverse transcribed using a primer, (lob) , 
which is specific to exon 2 similarly as has been 
described for the primer extension reaction except that 
radioactive tracer was included in the reaction. The 
fractions collected from an agarose bead column of the 
first strand synthesis were assayed by polymerase chain 
reaction (PGR) of eluted fractions. The oligonucleotides 
used were within the 10-1 sequence (145 nucleotides 
apart) just 5' of the extension primer. The earliest 
fractions yielding PCR product were pooled and 
concentrated by evaporation and subsequently tailed with 
terminal deoxynucleotidyl transferase (BRL Labs.) and 
dATP as recommended by the supplier (BRL Labs) . a second 
strand synthesis was then carried out with Tag Polymerase 
(Cetus, AmpliTag«) using an oligonucleotide containing a 
tailed linker sequence 5'CGQAATTCTCGA6ATC(T) 12 3' . 

Amplification by an anchored (PCR) experiment using 
the linker, sequence and a primer just internal to the 
extension primer which possessed the Eco Ri restriction 
30 site at its 5' end was then carried out. Following 
restriction with the enzymes Eco ri and Bgl h and 
agarose gel purification size selected products were 
cloned into the plasmid Bluescript KS available from 
Stratagene by standard procedures (Maniatis et al 
35 supra.) . Essentially all of the recovered clones ' 
contained inserts of less than 350 nucleotides. To 
obtain the 3' end clones, first strand cDNA was prepared 
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with reverse transcription of 2 Ag T84 poly A + rna using 
the tailed linker oligonucleotide previously described 
with conditions similar to those of the primer extension. 
Amplification by PCR was then carried out with the linker 
oligonucleotide and three different oligonucleotides 
corresponding to known sequences of clone T16-4.5. A 
preparative scale reaction (2 x 100 ul) was carried out 
with one of these oligonucleotides with the sequence 
5 ' ATGAAGTCCAAGGATTTAG3 ' . 

This oligonucleotide is approximately 70 nucleotides 
upstream of a Hind III site within the known sequence of 
T16-4.5. Restriction of the PCR product with Hind III 
and Xho 1 was followed by agarose gel purification to 
size select a band at 1.0-1.4 kb. This product was then 
cloned into the plasmid Bluescript KS available from 
Stratagene. Approximately 20% of the obtained clones 
hybridized to the 3' end portion of T16-4.5. 10/10 of 
plasmids isolated from these clones had identical 
restriction maps with insert sizes of approx. 1.2 kb. 
All of the PCR reactions were carried out for 30 cycles 
in buffer suggested by an enzyme supplier. 

An extension primer positioned 157 nt from the 5 'end 
of 10-1 clone was used to identify the start point of the 
putative CF transcript. The primer was end labelled with 
7[ P]ATP at 5000 Curies/mole and T4 polynucleotide kinase 
and purified by spun column gel filtration. The 
radiolabeled primer was then annealed with 4-5 ug poly a 
+ RNA prepared from T-84 colonic carcinoma cells in 2X 
reverse transcriptase buffer for 2 hrs. at 60*C. 
Following dilution and addition of AMV reverse 
transcriptase (Life Sciences, inc.) incubation at 4i'c 
proceeded for 1 hour. The sample was then adjusted to 
0.4M NaOH and 20 mM EDTA, and finally neutralized, with 
NH^OAc, P H 4.6, phenol extracted, ethanol precipitated 
redissolved in buffer with formamide, and analyzed on a 
polyacrylamide sequencing gel. Details of these methods 
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have been described fMeth. En^^r 152f 1987/ Ed> g>L 
Berger, A.R. Kimmel, Academic Press, N.Y.). 

Results of the primer extension experiment using an 
extension oligonucleotide primer starting 157 nucleotides 
from the 5' end of 10-1 is shown in Panel A of Figure 10 
End labelled *X174 bacteriophage digested with Hae in 
(BRL Labs) is used as size marker. Two major products 
are observed at 216 and 100 nucleotides. The sequence 
corresponding to 100 nucleotides in 10-1 corresponds to a 
very GC rich sequence (11/12) suggesting that this could 
be a reverse transcriptase pause site. The 5' anchored 
PCR results are shown in panel B of Figure 10. The 1.4% 
agarose gel shown on the left was blotted and transferred 
to Zetaprobe™ membrane (Bio-Rad Lab) . DNA gel blot 
hybridization with radiolabeled 10-1 is shown on the 
right. The 5' extension products are seen to vary in 
size from 170-280 nt with the major product at about 200 
nucleotides. The PCR control lane shows a fragment of 
145 nucleotides, it was obtained by using the test 
20 oligomers within the 10-1 sequence. The size markers 

shown correspond to sizes of 154, 220/210, 298, 344, 394 
nucleotides (lkb ladder purchased from BRL Lab) . 

The schematic shown below Panel B of Figure 10 
outlines the procedure to obtain double stranded cdna 
used for the amplification and cloning to generate the 
clones PA3-5 and TB2-7 shown in Figure 7. The anchored 
PCR experiments to characterize the 3 'end are shown in 
panel C. As depicted in the schematic below Figure 10C 
three primers whose relative position to each other were 
known were used for amplification with reversed 
transcribed T84 RNA as described. These products were 
separated on a 1% agarose gel and blotted onto nylon 
membrane as described above. DNA-blot hybridization with 
the 3' portion of the T16-4.5 clone yielded bands of 
sizes that corresponded to the distance between the 
specific oligomer used and the 3 'end of the transcript. 
These bands in lanes 1, 2a and 3 are shown schematically 
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below Pane! c in Figure 10. ». bend in i,„. 3 is Keak ' 
es only 60 nucleotides of this segment overlaps vitHne 
probe used. Also indicated in the schematic ZL„.T 
in the l.„, 2 b ls ^ product ^"f™ 

5 the enchored pcr product to facilitate cHuTeo 
generate the THZ-4 clone shown in Figure 7 

DHA-blot hybridisation enelysis of genomic OHh 
digested with EcoPI and HindKl enzymes probed wi^n 

10 ZtZ CDNAS SPa " nlng ^ 6nti ™ ^"script suggest 
10 that the gene contains at Isast 26 axons numbsred as 

Roman numerals I through XXVI (a „ Plgure 9) . 

correspond to the numbers l through 26 shown i„^" 7 

The size of each band is given in *b. 

15 no,,^ 1 " FigUrS ° Pe " b ° XM lndle » te "PPrcximete 

~! < ?\ " eXOTS Whlch h ™ identified by 

brr'rr::: :i: 22 r~ f ™ <- ™^ 

clonelne " anH-Tf ^ ** *»i*>ed to 

xone raes and 3' ends. The lengths in kb «♦> r. 

20 indicated. The hatched boxes in Pi gur e 7 i„ dl "! «. 
presence of intron segusnces and thHtJp .Itxls 
indicate other sequences. D. plcted in £ " 
the closed box is th. relative position of tzTI, * 
used to detect the first cm,* cLne loVf r™ 

» Phage of the nor^l ,„eat gland ^ L^hTTn" 
Figures 4(D) and 7, the genomic clone Hi., p^J" 

30 and/or 

With reference to Figure 9, the hybridization 
analysis includes probes; i>e eDNa i ridi2ation 
_ , ' *»«•# cdna clones 10-1 

panel A, T16-1 f3' nor +^.v « AU 1 for 

" paneTp' t ^ ^ ~«--- ^on, <ZT 
Panel L. In panel a of Pigure 9, the cdna probe 10-1 



rj^+.«^ ^ une cuna probe 

detects the genomic bands for exons I through w.' 
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WO 91/10734 



PCI7CA91/00009 



36 

portion of T16-1 generated by Nrul restriction detects 
exons IV through XIII as shown in Panel B. This probe 
partially overlaps with 10-1. Panels c and D, 
respectively, show genomic bands detected by the central 
5 and 3' end EcoRI fragments of the clone T16-4.5. Two 
EcoRl sites occur within the cONA sequence and split 
exons XIII and XIX. As indicated by the exons in 
parentheses, two genomic EcoRI bands correspond to each 
of these exons. Cross hybridization to other genomic 
10 fragments was observed. These bands, indicated by N are 
not of chromosome 7 origin as they did not appear in' 
human-hamster hybrids containing human chromosome 7. The 
faint band in panel D indicated by XI in brackets is 
believed to be caused by the cross-hybridization of 
15 sequences due to internal homology with the cDNA. 

Since 10-1 detected a strong band on gel blot 
hybridization of rna from the T-84 colonic carcinoma cell 
line, this cDNA was used to screen the library 
constructed from that source. Fifteen positives were 
20 obtained from which clones T6, T6/20, Til, T16-1 and T13- 
l were purified and sequenced. Rescreening of the same 
library with a 0.75 kb Bam HI-Eco Ri fragment from the 3' 
end of T16-1 yielded T16-4.5. a l.ekb EcoRI fragment 
from the 3' end of T16-4.5 yielded T8-B3 and T12a, the 
25 latter of which contained a polyadenylation signal and 
tail. Simultaneously a human lung cDNA library was 
screened; many clones were isolated including those shown 
here with the prefix *CDL' . A pancreas library was also 
screened, yielding clone CDPJ5. 
30 to obtain copies of this transcript from a CF 

patient, a cDNA library from RNA of sweat gland 
epithelial cells from a patient was screened with the 
0.75 kb Bam HI - Eco Ri fragment from the 3' end of T16-1 
and clones C16-1 and Ci-1/5, which covered all but exon 
35 1, were isolated. These two clones both exhibit a 3 bp 
deletion in exon 10 which is not present in any other 
clone containing that exon. Several clones, including 
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CDLS26-1 from the lung library and T6/20 and T13-1 
Isolated from T84 were derived from partially processed 
transcripts. This was confirmed by genomic hybridization 
and by sequencing across the exon-intron boundaries for 
5 each clone. Til also contained additional sequence at 
each end. T16-4.5 contained a small insertion near the 
boundary between exons 10 and 11 that did not correspond 
to intron sequence, clones CDLS16A, Ha and 13a from the 
lung library also contained extraneous sequences of 
10 unknown origin. The clone CI6-1 also contained a short 
insertion corresponding to a portion of the -y-transposon 
of a. coli; this element was not detected in the other 

IT™ 5 ' Cl ° neS PA3 ~ 5 ' * enerated Pancreas RNA 

and TB2-7 generated from T84 RNA using the anchored PGR 

15 technique have identical sequences except for a single 
nucleotide difference in length at the 5' end as shown in 
Figure 1. The 3' clone, TH2-4 obtained from T84 RNA 
contains the 3< sequence of the transcript in concordance 
witn the genomic sequence of this region. 

20 a combined sequence representing the presumptive 

coding region of the CF gene was generated from 
overlapping cDNA clones, since most of the cDNA clones 
were apparently derived from unprocessed transcripts, 

25 ITT" ^f 68 Perf ° raed to «««• the authenticity 

25 of the combined sequence. Each cDNA clone was first 

tested for localization to chromosome 7 by hybridization 
analysis with a human-hamster somatic cell hybrid 
containing a single human chromosome 7 and by pulsed 
field gel electrophoresis. Fine restriction enzyme 
30 mapping was also performed for each clone, while 

overlapping regions were clearly identifiable for most of 
patterns 68 ' C ° ntained regions of uni *u* restriction 

To further characterize these cDNA clones, they were 
>5 used as probes in gel hybridization experiments with 

11Z~°1 ^T 111 '^ 9 ^ ^ 9en0niC DNA ' Aa - 
Figure 9, five to six different restriction fragments 
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could be detected with the 10-1 cDNA and a" similar number 
of fragments with other cDNA clones, suggesting the 
presence of multiple exons for the putative CF gene. The 
hybridization studies also identified those cDNA clones 
5 with unprocessed intron sequences as they showed 

preferential hybridization to a subset of genomic DNA 
fragments. For the confirmed cDNA clones, their 
corresponding genomic DNA segments were isolated and the 
exons and exon/intron boundaries sequenced. As indicated 
10 in Figure 7, at least 27 exons have been identified which 
includes split exons 6a, 6b, 14a, 14b and 17a, 17b. 
Based on this information and the results of physical 
mapping experiments, the gene locus was estimated to span 
250 kb on chromosome 7. 

15 iii THE SBfiBgKSl 

Figure 1 shows the nucleotide sequence of the cloned 
cDNA encoding CFTR together with the deduced amino acid 
sequence. The first base position corresponds to the 
first nucleotide in the 5' extension clone PA3-5 which is 
20 one nucleotide longer than TB2-7. Arrows indicate 
position of transcription initiation site by primer 
extension analysis. Nucleotide 6129 is followed by a 
poly(dA) tract. Positions of exon junctions are 
indicated by vertical lines. Potential membrane-spanning 
25 segments were ascertained using the algorithm of 

Eisenberg et al J. M°l, Biol, 179:125 (1984). Potential 
membrane-spanning segments as analyzed and shown in 
Figure n are enclosed in boxes of Figure l. m Figure 
11, the mean hydropathy index [Kyte and Doolittle, 
30 Holec Biol. 157: 105, (1982)] of 9 residue peptides is 
plotted against the amino acid number. The corresponding 
positions of features of secondary structure predicted 
according to Gamier et al, rj. Moi^. ^ 157# 165 
(1982)] are indicated in the lower panel. Amino acids 
comprising putative ATP-binding folds are underlined in 
Figure 1. Possible sites of phosphorylation by protein 
kinases A (PKA) or c (PRC) are indicated by open and 
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closed circles, respectively. The open triangle is over 
the 3bp (CTT) which are deleted in CP (see discussion 
below) . The cDNA clones in Figure 1 were sequenced by 
the dideoxy chain termination method employing 3S S 
5 labelled nucleotides by the Dupont Genesis 2000 1 * 
automatic DNA sequencer. 

The combined cDNA sequence spans 6129 base pairs 
excluding the poly (A) tail at the end of the 3' 
untranslated region and it contains an ORF capable of 

10 encoding a polypeptide of 1480 amino acids (Figure 1) . 
An ATG (AUG) triplet is present at the beginning of this 
ORF (base position 133-135) . Since the nucleotide 
sequence surrounding this codon ( 5 ' -AGAC CAUG CA-3 > ) has 
the proposed features of the consensus sequence (CC) 

15 A/GCCAU£G(G) of an eukaryotic translation initiation site 
with a highly conserved A at the -3 position, it is 
highly probable that this AUG • corresponds to the first 
methionine codon for the putative polypeptide. 

To obtain the sequence corresponding to the 5' end 

20 of the transcript, a primer-extension experiment was 

performed, as described earlier. As shown in Figure 10A, 
a primer extension product of approximately 216 
nucleotides could be observed suggesting that the 5' end 
of the transcript initiated approximately 60 nucleotides 

25 upstream of the end of cDNA clone 10-1. A modified 

polymerase chain reaction (anchored PCR) was then used to 
facilitate cloning of the 5 '-end sequences (Figure 10b). 
Two independent 5 '-extension clones, one from pancreas 
and the other from T84 RNA, were characterized by DNA 

30 sequencing and were found to differ by only l base in 

length, indicating the most probable initiation site for 
the transcript as shown in Figure l. . 

Since most of the initial cDNA clones did not 
contain a polyA tail indicative of the end of a mRNA, 

35 anchored PCR was also applied to the 3' end of the 
transcript (Frohman et al, 1988, supra ) . Three 3'- 
extension oligonucleotides were made to the terminal 
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portion of the cDNA done H6-4.5. ^ shom ln pl 
10c, 3 pcr Product, or different sizes were obtained 
All were consistent with the interpretation that^ne'end 

S 1 WnclrT* * «— «* downstream o£ 

the Hindi!! site at nucleotide position 5027 (see Figure 
1) . The DHA eeguence derived from repressive IZ.l 
was to agreement with that of the T84 cOHA clone *HT 
(see Figure 1 and 7) and the seguence of the 
corresponding 2.3 Jcb ecoP.1 genomic fragment. 
10 MOI.FBnt.y B frmrrqe ftr ?r 

itl sites i?j f|miiaasT"f 

To visualize the transcript for the putative CF 

parforH T experiments we" 

is n °^ ed " ith ths ""I as probe. The SNA 

IS hybridation results are shown in Figure 8 

RHA samples were prepared from tissue samples 

obtained from suroical „.»>,..,- ""pies 

^.ethods -^zs?z?£S7.t~~ 

Probes labeled to high ^luZ ^ZTtTrT ~ 
priming method (A.P. Feinberg a nd B . £££ ZT 
BisfiHsib. 132, «, 1983) .e^r*,. ^ """""tern, ina^ 

- process C i. Can ' I 

663 19881 Am. J. Hlfflj -~,- t 43 645 _ 

oil 10-i'to aTL! ! hOWS h ^-tion by the « 
tnZLl Y transcript in the tissues 

indicated. Total RNA (10 ug) of eaeh t-<« 

comparison to standard ^i^^x^T* 
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hybridization signals were also detected in pancreas and 
primary cultures of cells from nasal polyps, suggesting 
that the mature mRNA of the putative CF gene is 
approximately 6.5 kb. Minor hybridization signals, 
probably representing degradation products, were detected 
at the lower size ranges but they varied between 
different experiments. Identical results were obtained 
with other cDNA clones as probes. Based on the 
hybridization band intensity and comparison with those 
detected for other transcripts under identical 
experimental conditions, it was estimated that the 
putative CF transcripts constituted approximately 0.01% 
of total mRNA in T84 cells. 

A number of other tissues were also surveyed by RNA 
gel blot hybridization analysis in an attempt to 
correlate the expression pattern of the lo-l gene and the 
pathology of CF. As shown in Figure 8, transcripts, all 
of identical size, were found in lung, colon, sweat 
glands (cultured epithelial cells) , placenta, liver, and 
parotid gland but the signal intensities in these tissues 
varied among different preparations and were generally 
weaker than that detected in the pancreas and nasal 
polyps, intensity varied among different preparations, 
for example, hybridization in kidney was not detected in 
the preparation shown in Figure 8, but can be discerned 
in subsequent repeated assays. No hybridization signals 
could be discerned in the brain or adrenal gland (Figure 
8), nor in skin fibroblast and lymphoblast cell lines. 

In summary, expression of the CF gene appeared to 
occur in many of the tissues examined, with higher levels 
in those tissues severely affected in CF. While this 
epithelial tissue-specific expression pattern is in good 
agreement with the disease pathology, no significant 
difference has been detected in the amount or size of 
transcripts from CF and control tissues, consistent with 
the assumption that CF mutations are subtle changes at 
the nucleotide level. 



WO 91/10734 



PCT/CA91/00009 



42 

3j£ THE MAJOR CP MOTATTON 

Figure 16 shows the DNA sequence at the F508 
deletion. On the left, the reverse complement of the 
sequence from base position 1649-1664 of the normal 
5 sequence (as derived from the cDNA clone T16) . The 
nucleotide sequence is displayed as the output (in 
arbitrary fluorescence intensity units, y-axis) plotted 
against time (x-axis) for each of the 2 photomultiplier 
tubes (PMT#l and #2) of a Dupont Genesis 2000™ DNA 
10 analysis system. The corresponding nucleotide sequence 
is shown underneath. On the right is the same region 
from a mutant sequence (as derived from the cDNA clone 
C16) . Double-stranded plasmid DNA templates were 
prepared by the alkaline lysis procedure. Five fig of 
15 plasmid DNA and 75 ng of oligonucleotide primer were used 
in each sequencing reaction according to the protocol 
recommended by Dupont except that the annealing was done 
at 45 »c for 30 min and that the elongation/ termination 
step was for 10 min at 42 °C. The unincorporated 
20 fluorescent nucleotides were removed by precipitation of 
the DNA sequencing reaction product with ethanol in the 
presence of 2.5 M ammonium acetate at pH 7.0 and rinsed 
one time with 70% ethanol. The primer used for the T16-1 
sequencing was a specific oligonucleotide 
25 5 'GTTGGCATGCTTTGATGACGCTTC3 ' spanning base position 

1708 - 1731 and that for C16-1 was the universal primer 
SK for the Bluescript vector (Stratagene) . 

Figure 17 also shows the DNA sequence around the 
F508 deletion, as determined by manual sequencing. The 
normal sequence from base position 1726-1651 (from cDNA 
T16-1) is shown beside the CF sequence (from cDNA C16-1) . 
The left panel shows the sequences from the coding 
strands obtained with the B primer 

( 5 ' GTTTTCCTGGATTATGCCTGGCAC3 ' ) and the right panel those 
35 from the opposite strand with the D primer 

( 5 ' GTTGGCATGCTTTGATGACGCTTC3 ' ) . The brackets indicate 
the three nucleotides in the normal that are absent in CF 
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(arrowheads) . Sequencing was performed as described in 
F. Sanger, S. Nicklen, A. R. Coulsen, Proc. Nat, Acad, 
Sci. U. S. A. 74; 5463 (1977). 

The extensive genetic and physical mapping data have 
5 directed molecular cloning studies to focus on a small 
segment of DNA on chromosome 7. Because of the lack of 
chromosome deletions and rearrangements in CF and the 
lack of a well-developed functional assay for the CF gene 
product, the identification of the CF gene required a 

10 detailed characterization of the locus itself and 
comparison between the CF and normal (N) alleles. 
Random, phenotypically normal, individuals could not be 
included as controls in the comparison due to the high 
frequency of symptomless carriers in the population. As 

15 a result, only parents of CF patients, each of whom by 
definition carries an N and a CF chromosome, were 
suitable for the analysis. Moreover, because of the 
strong allelic association observed between CF and some 
of the closely linked DNA markers, it was necessary to 

20 exclude the possibility that sequence differences 

detected between N and CF were polymorphisms associated 
with the disease locus. 

1^1 IPEKTIFICATION OF RFLPa AND PMf ^ LY STUDIES 

To determine the relationship of each of the DNA 
25 segments isolated from the chromosome walking and jumping 
experiments to CF, restriction fragment length 
polymorphisms (RFLPs) were identified and used to study 
families where crossover events had previously been 
detected between CF and other flanking DNA markers. As 
30 shown in Figure 14, a total of 18 RFLPs were detected in 
the 500 kb region; 17 of them (from E6 to CE1.0) listed 
in Table 2; some of them correspond to markers previously 
reported. 

Five of the RFLPs, namely 10-1X.6, T6/20, HI. 3 and 
35 CEl.o, were identified with cDNA and genomic DNA probes 
derived from the putative CF gene. The RFLP data are 
presented in Table 2, with markers in the MET and D7S8 
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regions included for comparison. The physical distances 
between these markers as well as their relationship to 
the MET and D7S8 regions are shown in Figure 14. 
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NOTES FOR TABI.E 2 

(a) The number of N and CF-PI (CF with pancreatic 
insufficiency) chromosomes were derived from the 
parents in the families used in linkage analysis 
[Tsui et al, cold pprjpq Harbor Svmn ou^t. B joJ 
51:325 (1986)). 

(b) Standardized association (A) , which is less 
influenced by the fluctuation of DNA marker allele 
distribution among the N chromosomes, is used here 
for the comparison Yule's association coefficient 
A=(ad-bc)/(ad+bc), where a, b, c, and d are the 
number of N chromosomes with DNA marker allele 1, « 
with l, n with 2, and CF with 2 respectively. 
Relative risk can be calculated using the 
relationship rr = (l+A)/(l-A) or its reverse. 

(c) Allelic association (*), calculated according to A 
Chakravarti et al, Am. .t. h»„ r T ^ r , 3 6:1239, 
(1984) assuming the frequency of 0.02 for CF 
chromosomes in the population is included for 
comparison. 



25 



30 



35 



Because of the small number of recombinant families 
available for the analysis, as was expected from the 
close distance between the markers studied and CF, and 
the possibility of misdiagnosis, alternative approaches 
were necessary in further fine mapping of the CF gene. 

Allelic association (linkage disequilibrium) has 
been detected for many closely linked DNA markers. While 
the utility of using allelic association for measuring 
genetic distance is uncertain, an overall correlation has 
been observed between CF and the flanking DNA markers, a 
strong association with CF was noted for the closer DNA 
markers, D7S23 and D7S122, whereas little or no 
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association was detected for the more distant markers 
MET, D7S8 or D7S424 (see Figure 1) . 

As shown in Table 2, the degree of association 
between DNA markers and CF (as measured by the Yule's 
5 association coefficient) increased from 0.35 for metH and 
0.17 for J32 to 0.91 for 10-1X.6 (only CF-PI patient 
families were used in the analysis as they appeared to be 
genetically more homogeneous than CF-PS) . The 
association coefficients appeared to be rather constant 
10 over the 300 kb from EG1.4 to HI. 3; the fluctuation 
detected at several locations, most notably at H2.3A, 
E4.1 and T6/20, were probably due to the variation in the 
allelic distribution among the N chromosomes (see Table 
2). These data are therefore consistent with the result 
15 from the study of recombinant families (see Figure 14) . 
A similar conclusion could also be made by inspection of 
the extended DNA marker haplotypes associated with the CF 
chromosomes (see below) . However, the strong allelic 
association detected over the large physical distance 
20 between EG1.4 and hi.3 did not allow further refined 

mapping of the CF gene. Since J44 was the last genomic 
DNA clone isolated by chromosome walking and jumping 
before a cDNA clone was identified, the strong allelic 
association detected for the JG2E1-J44 interval prompted 
25 us to search for candidate gene sequences over this 
entire interval. It is of interest to note that the 
highest degree of allelic association was, in fact, 
detected between CF and the 2 RFLPs detected by 10-1X.6, 
a region near the major CF mutation. 
30 Table 3 shows pairwise allelic association between 

DNA markers closely linked to CF. The average number of 
chromosomes used in these calculations was 75-80 and only 
chromosomes from CF-PI families were used in scoring CF 
chromosomes, similar results were obtained when Yule's 
35 standardized association (A) was used. 



WO 91/10734 



PCT/CA91/00009 



52 



I Z 9 J 8 9 3 3 9 S S 3 9 9 S 9 9 9 3 9 9 * ' * • 
§!s339S99929333993 3'9 99 ?"« 
4 1 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 ■ *S 

gi 9 9 9 9 9 9'9 9 9 9 9 9 9 9 9 9 9 9 9 • 9 5-? 
11991992233399393333 ■ <s?S 

Il99999999999999999-999«<i 
911 2 2 3 § 9 2 9 9 2 3 5 3 9 9 2 5 • 9 2 3 3 55 
*l| 2 2 3 9 1 3 2 9 3 2 1 2 2 2 3 • 132 *§55 
*13S399929993995 • 99593955 
8 St3 9 2 2 3 9 S 3 9 5 §3§-- 329529*55 
§ 11332333333333 • 99999S3555 
§ 3152225992522 • 3S3SS9"55fc 
J iS2593i3i392 • 3S2322ii2i5= 
5 3|* sss 3 3 9 9 5 9-9 9 9 99 9 9 9 9 9 9 5 5 

?n!!!!!»- s «>»5i 3 9 5 9 9 i5 

119333329 3 3 2 3 3 3 3 3 3 3 9 3 3 2* 
S29 993999-399993993S999'«5 

II 9 2 9 2 5 • 9 9 9 § 3 3 9 9 9 2 S 9 3 2 2 i5 
&J3239 • 3 3 5 2 2 3 3 3 3 1 5 5 2 5 5 2 2 1 
a?9 2 5 •5 3 2 3 3 3 3 2 2 9 5 2 2 2 3 5 51^ 
i?33 •2 2 9 9 2 2 2 9 2 5 9 9 3 9 9 3 9 2 2 1 
|?3 • 3 3 5 3 3 5 3 3 3 2 2 3 2 2 5 2 3 3 3 5 s ? 

i • 5 5 5 5 233 5 3 2 2 39 9 999 9 9 9 2 1 



en 

w 

9 



I!II?IlIIt?lltI31lll3II 

l,iB '!JJiM|i3«§l|3gifJ 



seujosouioiip jo 

SUBSTITUTE SHEET 



WO 91/10734 



PCT/CA91/00009 



53 



Strong allelic association was also detected among 
subgroups of RFLPs on both the CF and N chromosomes. As 
shown in Table 3, the DMA markers that are physically 
close to each other generally appeared to have strong 
5 association with each other. For example, strong (in 
some cases almost complete) allelic association was 
detected between adjacent markers E6 and E7, between 
PH131 and W3D1.4 between the AccI and Haelll polymorphic 
sites detected by 10-1X.6 and amongst EG1.4, JG2E1, 
10 E2.6(E.9), E2.8 and E4.1. The two groups of distal 
markers in the MET and D7S8 region also showed some 
degree of linkage disequilibrium among themselves but 
they showed little association with markers from E6 to 
CEi.o, consistent with the distant locations for MET and 
15 D7S8. on the other hand, the lack of association between 
DNA markers that are physically close may indicate the 
presence of recombination hot spots. Examples of these 
potential hot spots are the region between E7 and pH131, 
around H2.3A, between J44 and the regions covered by the 
20 probes io-ix.6 and T6/20 (see Figure 14). These regions, 
containing frequent recombination breakpoints, were 
useful in the subsequent analysis of extended haplotype 
data for the CF region. 
1*£ haplotype M1MI6I8 
25 Extended haplotypes based on 23 DNA markers were 

generated for the CF and N chromosomes in the collection 
of families previously used for linkage analysis. 
Assuming recombination between chromosomes of different 
haplotypes, it was possible to construct several lineages 
of the observed CF chromosomes and, also, to predict the 
location of the disease locus. 

To obtain further information useful for 
understanding the nature of different CF mutations, the 
F508 deletion data were correlated with the extended DNA 



30 
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marker haplotypes. As shown in Table 4, five major 
groups of N and CF haplotypes could be defined by the 
RFLPs within or immediately adjacent to the putative CF 
gene (regions 6-8) . 
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TABLE 4 (continued) 
(a) The extended hap lo type data are derived from the CF 
families used in previous linkage studies (see footnote 
(a) of Table 3) with additional CF-PS families collected 
5 subsequently (Kerem et al, Am, J. Genet. 44:827 (1989)). 
The data are shown in groups (regions) to reduce space. 
The regions are assigned primarily according to pairwise 
association data shown in Table 4 with regions 6*8 
spanning the putative CF locus (the F508) deletion is 

10 between regions 6 and 7) . A dash (-) is shown at the 

region where the haplotype has not been determined due to 
incomplete data or inability to establish phase. 
Alternative haplotype assignments are also given where 
date are incomplete. Unclassified includes those 

15 chromosomes with more than 3 unknown assignments. The 
haplotype definitions for each of the 9 regions are: 



Region l- 


metD 


metD 


metH 




BanI 


laal 


Xaal 


20 A = 


1 


l 


i 


B = 


2 


i 


2 


C « 


1 


i 


2 


D = 


2 


2 


1 


E » 


1 


2 




25 F ■ 


2 


1 


1 


G - 


2 


2 


2 



Region 2- £6 

30 Taql 

A = l 

B «= 2 

C « , 1 

35 D = 2 

E = 2 



E7 pH131 W3D1.4 

lasl Binfl Hindni 

2 2 2 

111 
2 11 
12 2 
2 2 1 
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Region 3 

5 

A - 
B = 

10 Region 4 

A = 

B «= 

15 c « 

D = 
E - 



20 

Region 5 

25 A - 

B - 
C « 

Region 6- 

30 

A ■ 
B - 
C " 
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12 12 
112 2 

H2.3A 
TaqI 

1 
2 

EG1.4 EG1.4 JG2E1 

HinelT Ball PstI 

112 
2 2 1 

2 2 2 

111 
12 1 



E2 . 6 E2 . 8 E4 . 1 

MSPl Ncol MspI 

2 12 
12 1 
2 2 2 

J44 10-1X.610-1X.6 

12 1 
2 1 2 

112 
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Region 7- T6/20 
MspI 



A — 1 
5 B = 2 

Region 8- HI. 3 CE 1.0 

Ncol Ndel 

10 A = 2 1 

B - 1 2 

C = l i 

D » 2 2 

15 Region 9- J32 J3.ll J29 

SacI MspI Pvull 

A - 1 1 i 

B = 2 2 2 

20 C - 2 1 2 

D - 2 2 1 

E = 2 1 i 

(b) Number of chromosomes scored in each class: 
25 CP-Pi (F) = CF chromosomes from CF-PI patients with 

the F508 deletion; 
CF-PS(F) « CF chromosomes from CF-PS patients with 

the F508 deletion; 
CF-PI » Other CF chromosomes from CF-PI patients; 
30 CF-PS - Other CF chromosomes from CF-PS patients; 

N = Normal chromosomes derived from carrier parents 
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It was apparent that most recombinations between 
haplotypes occurred between regions 1 and 2 and between 
regions 8 and 9, again in good agreement with the 
relatively long physical distance between these regions. 
5 other, less frequent, breakpoints were noted between 

short distance intervals and they generally corresponded 
to the hot spots identified by pairwise allelic 
association studies as shown above, it is of interest to 
note that the F508 deletion associated almost exclusively 
10 with Group i, the most frequent CF haplotype, supporting 
the position that this deletion constitutes the major 
mutation in CF. More important, while the F508 deletion 
was detected in 89% (62/70) of the CF chromosomes with 
the AA haplotype (corresponding to the two regions, 6 and 
15 7) flanking the deletion, it was not was found in the 14 
N chromosomes within the same group (* - 47.3, p <i 0 -<) . 
The F508 deletion was therefore not a sequence 
polymorphism associated with the core of the Group I 
haplotype (see Table 5) . 
20 Together, the results of the oligonucleotide 

hybridization study and the haplotype analysis support 
the fact that the gene locus described here is the CF 
gene and that the 3 bp (F508) deletion is the most common 
mutation in CF. 
25 3^6 .raniON/EXON Bomm^ m 

The entire genomic CF gene includes all of the 
regulatory genetic information as well as intron genetic 
information which is spliced out in the expression of the 
CF gene. Portions of the introns at the intron/exon 
boundaries for the exons of the CF gene are very helpful 
m locating mutations in the CF gene, as they permit PCR 
analysis from genomic DNA. Genomic DNA can be obtained 
from any tissue including leukocytes from blood, such 
intron information can be employed in PGR analysis for 
purposes of CF screening which will be discussed in more 
detail in a later section. As set out in Figure is with 
the headings "Exon 1 through Exon 24", there are portions 
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35 
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of the bounding introns in particular those that f lank 
the exons which are essential for PGR exon amplif ication. 

Further assistance in interpreting the information 
of Figure 18 is provided in Figure 21. Genomic DNA 
5 clones containing the coding region of the CFTR gene are 
provided. As is apparent from Figure 21, there are 
considerable gaps between the clones of the exons which 
indicates the gaps in the intron portions between the 
exons of Figure 18. These gaps in the intron portions 

10 are indicated by M ... M . In Figure 21, the clones were 

mapped using different restriction endonucleases (AccI,A; 
AvaI,W; BamHX , B ; BgIII f G; BssHI,Y; EcoRV,V; FspI,F; 
Hindi, C; HindIII,H; Kpn,K; NcoI,J; PstI,P; PvuII,U; 
SmaI,M; SacI,S; SspI,E; StyI,T; XbaI,X; Xhol,0). In 

15 Figure 21, the exons are represented by boxed regions. 

The open boxes indicate non-coding portions of the exons, 
whereas closed boxes indicate coding portions. The 
probable positions of the exons within the genomic DNA 
are also indicated by their relevative positions. The 

20 arrows above the boxes mark the location of the 

oligonucleotides used as sequencing primers in the PCR 
amplification of the genomic DNA. The numbers provided 
beneath the restriction map represent the size of the 
restriction fragments in kb. 

25 m sequencing the intron portions, it has been 

determined that there are at least 27 exons instead of 
the previously reported 24 exons in applicants' 
aforementioned co-pending applications. Exons 6, 14 and 
17, as previously reported, are found to be in segments 

30 and are now named exons 6a, 6b, exons 14a, 14b and exons 
17a, 17b. 

The intron portions, which have been used in PCR 
amplification, are identified in the following Table 5 
and underlined in Figure 18. The portions identified by 
35 the arrows are selected, but it is understood that other 
portions of the intron sequences are also useful in the 
PCR amplification technique. For example, for exon 10 
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the relevant genetic information which is preferred in 
PCR is noted by reference to the 5' and 3' ends of the 
sequence. The intron section is identified with an »i«. 
Hence in Table 5 for exon 2, the preferred portions are' 
5 identified by 2i-5 and 2i-3 and similarly for axons 3 
through 24. For exon 1, the selected portions include 
the sequence GGA. . .AAA for B115-B and ACA...GTG for 10D. 
For exon 13, portions are identified by two sets: 13i-5 
and Cl-lm and X13B-5 and 13i-3A. (This exon (13) is 
10 large and most practical to be completed in two 

sections) . ci-lM and X13B-5 are from exon sequences. 
The specific conditions for PCR amplification of 
indivisual exons are summarized in the following Table 6 
and are discussed in more detail hereinafter with respect 
to the procedure explained in R.K. Saiki et al. Science 
230:1350 (1985). 

These oligonucleotides, as derived from the intron 
sequence, assist in amplifying by Pcr the respective 
exon, thereby providing for analysis for DNA sequence 
alterations corresponding to mutations of the CF gene 
The mutations can be revealed by either direct sequence 
determination of the pcr products or sequencing the 
products cloned in plasmid vectors. The amplified exon 
can also be analyzed by use of gel electrophoresis in the 
25 manner to be further described. It has been found that 
the sections of the intron for each respective exon are 
of sufficient length to work particularly well with PCR 
technique to provide for amplification of the relevant 



15 



20 



exon. 
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TABLE 5 



426 

528 



Oligonucleotides used for amplification of CP gene exons by PCR 
Exon PCR primers; 5'-> 3' Amplified product (bp) 

1 . GG AGTTCACTCACCTAAA (B115-B) 933 

ACACOCCCTCClCn-ltXnXi (10D) 

2 CCAAATCTGTATGOAGACCA (2i-5) 378 
TATGTTGCCCAGGCTGGTAT (2i-3) 

3 CTTOGGTTAATCTCCTTGGA (3i-5) 309 
ATTCACCAGATTTCGTAGTC (3i-3) 

4 TCACATATGGTATGACCCTC (4i-5) 438 
TTGTACCAGCTCACTACCTA (4i-3) 

5 ATTTCTGCCTAGATGCTGGG (5i-5) 395 
AACTOOGCCTTTCCAGTTGT (5i-3) 

6a TTAGTGTGCTCAG AACCACG (6 Ai-5) 385 

CTATGCATAGAGCAGTCCTG (6Ai-3) 
6b TGGAATGAGTCTGTACAGCG (6Ci-5) 417 

GAGGTGGAAGTCTACCATGA (6Ct-3) 

7 AG AOCATGCTCAGATCTTCCAT C7i-5) 410 
GCAAAGTTCATTAGAACTGATC (71-3) 

8 TOAATCCTAGTGCTTGGCAA (81*5) 359 
TCGCCATTAGGATGAAATCC (8i-3) 

9 TAATGGATCATGGGCCATGT (9i-5) 560 
ACAGTOTTOAATGTGGTCCA (SK-3) 

10 GCAGAGTACCTGAAACAGGA (101-5) 491 
CATTCACAGTAGCTTACCCA (101-3) 

11 CAACTOTGGTTAAAGCAATAGTGT (111*5) 425 
GCACAGATTCTGAGTAACCATAAT (111-3) 

12 GTGAATCG ATGTGGTG AOCA (121-5) 
CTGGTTTAGCATGAGGCGGT 021-3) 

13 (•) TGCTA AAATACG AOACATATTGCA (131-5) 

ATCTGGTACTAAGGACAG (C1-1M) 
(b) TCAATCCAATCAACTCTATACGAA (X13B-5) 497 

TACACCTTATCCTAATCCTATGAT (131-3A) 
14a AAAAGGTATOCCACTGTTAAG (14 Ai-5) 511 

GTATACATOCCCAAACTATCT (MAM) 
14b GAACACCTAGTACAGCTGCT (14BL5) 449 

AACrCCTGGGCTCAAGTCAT (14BI-3) 

15 GTOCATGCTCTICTAATOCA (151-5) 485 
AAGGCACATGCCTCTGTOCA (151-3) 

16 CAGAGAAATTGGTOGTTACT (16i-5) 570 
ATCTAAATGTGGGATTGCCT (161-3) 

17a CAATGTGCACATGTACCCTA (17A1-5) 579 

TGTACACCAACTGTGGTAAG (17A1-3) 
17b TTCAAAGAATGGCACCAGTGT (17B1-5) 463 

ATAACCTATAGAATGCAGCA (17B1-3) 

18 GTAGATGCTGTOATGAACTG (181-5) 
AGTGGCTATCTATGAGAAGG (181-3) 

19 GCCCGACAAATAACCAAGTGA (19i-5) 
GCTAACACATTGCTTCAGGCT (191-3) 

20 GGTCAGG ATTOAAAGTGTGCA (201-5) 
CTATGAGAAAACTGCACTGGA (20i-3) 

21 AATGTTCACAAGGGACTCCA (211-5) 
CAAAAGTACCTGTIGCrCCA (2U-3) 

22 AAACGCTOAGCCICACAAGA (221-5) 562 
TGTCACCATO AAGCAGGCAT (221-3) 

23 AGCTO ATTGTGCGTAACGCT (23i-5) 4m 
TAAAGCTGGATGGCTGTATG (231-3) 

24 GG ACACAGCAGTTAAATGTO (241-5) 569 
ACTATTGOCAGG AAGCCATT (241-3) 



4S1 
454 
473 
477 
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TABLE 6 



Thennal cycle 



Exon 



Buffer* 



Initial 
denaturadon 

tune/temp 




3-5 6a ,6b, A (14) 
7-10. 12, 
14a, 16, 17b, 
18-24 



6min/94C 



1 


B 


6mio/94C 


30 sec/94 C 


30 sec/55 C 


2.5min/72C 7min/72C 


2,11 


B 


6miq/94C 


30 sec/94 C 


30 sec/52 C 


1 min/72C 


7min/72C 


■13ft 


A(1.75) 


6min/94C 


30 sec/94 C 


30 sec/54 C 


2.5 min/72C 


7min/72C 


13b 


A(1.75) 


6min/94C 


30 sec/94 C 


30 sec/52 C 


24min/72C 


7min/72C 


14b 


B 


6min/94C 


30 sec/94 C 


30 sec/56 C 


1 min/72C 


7mia/72C 


17a 


MIS) 


6min/94C 


30 sec/94 C 30 sec/56 C 


lmln/72C 


7min/72C 



(•) Buffer A(l .5): * buffer with I JmMMgQj 
Buffer A(1.75): ^ buffer with 1.75mM Mgd 2 

BufferB: «7nMlri«.HapH8.8. ^nMMgC^ ^^(NH^O^aCTuMEDT*. 
lOmM B-mexcaptcwhsnol, 170 ug/ml BSA, 10% DMSO. 14 mM of each dNTFs 



.* Puffer A covins: to** -tf,r ^$.3 



5omW 



dNTPs -deoxynucleotido triphosphateo 
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2*2 CP MUTATIONS - AIS06 OR A1507 

The association of the P508 deletion with 1 common 
and 1 rare CP haplotype provided further insight into the 
number of mutational events that could contribute to the 
5 present patient population. Based on the extensive 

haplotype data, the original chromosome in which the F508 
deletion occurred is likely to carry the haplotype - 
AAAAAAA- (Group la) , as defined in Table 4 . The other 
Group I CF chromosomes carrying the deletion are probably 

10 recombination products derived from the original 

chromosome, if the CF chromosomes in each haplotype 
group are considered to be derived from the same origin, 
only 3-4 additional mutational events would be predicted 
(see Table 4) . However, since many of the CF chromosomes 

15 in the same group are markedly different from each other, 
further subdivision within each group is possible. As a 
result, a higher number of independent mutational events 
could be considered and the data suggest that at least 7 
additional, putative mutations also contribute to the CF- 

20 Pi phenotype (see Table 3) . The mutations leading to the 
CF-PS subgroup are probably more heterogeneous. 

The 7 additional CP-PI mutations are represented by 
the haplotypes: -CAAAAAA- (Group lb), -CABCAAD- (Group 
Ic) , BBBAC- (Group Ila) , -CABBBAB- (Group Va) . 

25 Although the molecular defect in each of these mutations 
has yet to be defined, it is clear that none of these 
mutations severely affect the region corresponding to the 
oligonucleotide binding sites used in the 
PCR/hybridization experiment. 

30 one CF chromosome hydridizing to the AF508-ASO 

probe, however, has been found to associate with a 
different haplotype (group Ilia) . It appeared that the 
AF508 should have occurred in both haplotypes, but with 
the discovery of AI507, it is discovered that it is not. 

35 instead, the AF508 is in group la, whereas the AI507 is 
in group Ilia. None of the other CP nor the normal 
chromosomes of this haplotype group (Ilia) have shown 
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hybridization to the mutant (AF508) ASO (B. Keren et al, 
Science 245:1073 (1989)]. In view of the group la and 
Ilia haplotypes being distinctly different from each 
other, the mutations harbored by these two groups of CF 
5 chromosomes must have originated independently. To 

investigate the molecular nature of the mutation in this 
group Ilia CF chromosome, we further characterized the 
region of interest through amplification of the genomic 
DNA from an individual carrying the chromosome Ilia by 

10 the polymerase chain reaction (PCR) . 

These polymerase chains reactions (PCR) were 
performed according to the procedure of R.K. Saiki et al 
science 230:1350 (1985). a specific DNA segment of 491 
bp including exon 10 of the CF gene was amplified with 

15 the use of the oligonucleotide primers ioi-5 (5'- 

GCAGAGTACCTGAAACAGGA-3 ' ) and 10i-3 

(5 ' CATTCACAGTAGCTTACCCA—3 ' ) located in the 5' and 3' 
flanking regions, respectively, as shown in Figure 18 and 
itemzied in Table 5. Both oligonucleotides were 

20 purchased from the HSC DNA Biotechnology Service Center 
(Toronto). Approximately 500 ng of genomic DNA from 
cultured lymphoblastoid cell lines of the parents and the 
CF child of Family 5 were used in each reaction. The DNA 
samples were denatured at 94 °C for 30 sec, primers 

25 annealed at 55»C for 30 sec, and extended at 72«C for 50 
sec. (with 0.5 unit of Taq polymerase, Perkin- 
Elmer/Cetus, Norwalk, CT) for 30 cycles and a final 
extension period of 7 min. in a Perkin-Elmer/Cetus DNA 
Thermal Cycler. Reaction conditions for PCR 

30 amplification of other exons are set out in Table 6. 

Hydridization analysis of the PCR products from 
three individuals of Family 5 of group Ilia was 
performed. The carrier mother and father are represented 
by a half -filled circle and square, respectively, and the 

35 affected son is a filled square in Figure 19a. The 
conditions for hybridizaton and washing have been 
previously described (Kerem et al, supra ^ . There is a 
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relatively weak signal in the father's PCR product with 
the mutant (oligo AF508) probe. In Figure 19b, DNA 
sequence analysis of the clone 5-3-15 and the PCR 
products from the affected son and the carrier father are 
5 shown. The arrow in the center panel indicates the 

presence of both A and T nucleotide residue in the same 
position; the arrow in the right panel indicates the 
points of divergence between the normal and the AI507 
sequence. The sequence ladders shown are derived from 

10 the reverse-complements as will be described later. 
Figure 19c shows the DNA sequences and their 
corresponding amino acid sequences of the normal, AI507, 
and AF508 alleles spanning the mutation sites are shown. 
With reference to Figure 19a, the PCR-amplif ied DNA from 

15 the carrier father, who contributed the group Ilia CF 
chromosome to the affected son, hybridized less 
efficiently with the AF508 ASO than that from the mother 
who carried the group la CF chromosome. The difference 
became apparent when the hybridization signals were 

20 compared to that with the normal ASO probe. This result 
therefore indicated that the mutation carried by the 
group Ilia CF chromosome might not be identical to AF508. 

To define the nucleotide sequence corresponding to 
the mutant allele on this chromosome, the PCR-amplif ied 

25 product of the father's DNA was excised from a 

polyacrylamide-electrophoretic gel and cloned into a 
sequencing vector. 

The general procedures for DNA isolation and 
purification for purposes of cloning into a sequencing 

30 vector are described in J. Sambrook, E.F. Fritsch, T. 

Maniatis, Molecular Cloning: A Laboratory m»™,„i , and ed. 
(Cold Spring Harbor Press, N.Y. 1989). The two 
homoduplexes generated by PCR amplification of the 
paternal DNA were purified from a 5% non-denaturing 

35 polyacrylamide gel (30:1 acrylamide:bis-acrylamide) . The 
appropriate bands were visualized by staining with 
ethidium bromide, excised and eluted in TE (10 mM Tris- 
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HC1; lnM EDTA; pH 7.5J for 2 to 12 hours at room 
temperature. The DNA solution was sequentially treated 
with Tris-equilibrated phenol, phenol/CHCl, and CHC1,. 
The DNA samples were concentrated by precipitation in 
5 ethanol and resuspension in TE, incubated with T4 

polynucleotide kinase in the presence of ATP, and ligated 
into diphosphorylated, blunt-ended Bluescript KS* vector 
(Stratagene, San Diego, CA) . clones containing amplified 
product generated from the normal parental chromosome 
10 were identified by hybridization with the oligonucleotide 
N as described in Kerem et al supra . 

Clones containing the mutant sequence were 
identified by their failure to hybridize to the normal 
ASO (Kerem et al, sjiaca, . one clone, 5-3-15 was isolated 
15 and its DNA sequence determined. The general protocol 
for sequencing cloned DNA is essentially as described 
[J.R. Riordan et al, SsiSDSZ 245:1066 (1989)] with the 
use of an U.S. Biochemicals Sequenase*" kit. To verify the 
sequence and to exclude any errors introduced by DNA 
polymerase during pcr, the DNA sequences for the PCR 
products from the father and one of the affected children 
were also determined directly without cloning. 

This procedure was accomplished by denaturing 2 
pmoles of gel-purified double-stranded PCR product in 0 2 
M NaOH/o.2 mtf EDTA (5 min. at room temperature) 
neutralized by adding o.l volume of 2 M ammonium acetate 
(PH 5.4) and precipitated with 2.5 volumes of ethanol 
at -70-c for 10 min. After washing with 70% ethanol, the 
DNA pellet was dried and redissolved in a sequencing 
reaction buffer containing 4 pmoles of the 
oligonucleotide primer lOi-3 of Figure 18, dithiothreitol 
(8.3 mM) and [a-35S]-dATP (o.s „ M , iooo Ci/mmole) . The 
mixture was incubated at 37*C for 20 min., following 
which 2 M l of labelling mix, as included in the 
35 Sequenase- Kit and then 2 units of Sequenase enzyme were 
added. Aliquotes of the reaction mixture (3.5 M l) were 
transferred, without delay, to tubes each containing 2.5 
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Ml of ddGTP, ddATP, ddTTP and ddCTP solutions (U.S. 
Biochemicals Sequenase kit) and the reactions were 
stopped by addition of the stop solution. 

The DNA sequence for this mutant allele is shown in 
5 Figure 19b. The data derived from the cloned DNA and 
direct sequencing of the PCR products of the affected 
child and the father are all consistent with a 3 bp 
deletion when compared to the normal sequence (Figure 
19c). The deletion of this 3 bp (ATC) at the 1506 or 

10 1507 position results in the loss of an isoleucine 
residue from the putative CFTR, within the same ATP- 
binding domain where AF508 resides, but it is not evident 
whether this deleted amino acid corresponds to the 
position 506 or 507. Since the 506 and 507 positions are 

15 repeats, it is at present impossible to determine in 

which position the 3 bp deletion occurs. For convenience 
in later discussions, however, we refer to this deletion 
as AI507. 

The fact that the AI507 and AF508 mutations occur in 

20 the same region of the presumptive ATP-binding domain of 
CFTR is surprising. Although the entire sequence of 
AI507 allele has not been examined, as has been done for 
AF508, the strategic location of the deletion argues that 
it is the responsible mutation for this allele. This 

25 argument is further supported by the observation that 
this alteration was not detected in any of the normal 
chromosomes studied to date (Kerem et al, supra ) . The 
identification of a second single amino acid deletion in 
the ATP-binding domain of CFTR also provides information 

30 about the structure and function of this protein. Since 
deletion of either the phenylalanine residue at position 
508 or isoleucine at position A1507 is sufficient to 
affect the function of CFTR such that it causes CF 
disease, it is suggested that these residues are involved 

35 in the folding of the protein but not directly in the 
binding of ATP. That is, the length of the peptide is 
probably more important than the actual amino acid 
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15 



35 



residues in this region. In support of this hypothesis, 
it has been found that the phenylalanine residue can be 
replaced by a serine and that isoleucine at position 506 
with valine, without apparent loss of function of CFTR. 
5 When the nucleotide sequence of AI507 is compared to 

that of AF508 at the ASO-hybridizing region, it was noted 
that the difference between the two alleles was only an A 

- T change (Figure 19c) . This subtle difference thus 
explained the cross-hybridization of the AF508-ASO to 

10 AI507. These results therefore exemplified the 

importance of careful examination of both parental 
chromosomes in performing ASO-based genetic diagnosis. 
It has been determined that the AF508 and AI507 mutations 
can be distinguished by increasing the stringency of 
oligonucleotide hybridization condition or by detecting 
the unique mobility of the heteroduplexes formed' between 
each of these sequences and the normal DNA on a 
polyacrylaminde gel. The stringency of hybridization can 
be increased by using a washing temperature at 45 »c 
instead of the prior 39»c in the presence of 2XSSC (lxssc 

- 150 mM NaCL and 15 mM Na citrate) . 
Identification of the AI507 and AF508 alleles by 

polyacrylamide gel electrophoresis is shown in Figure 20. 
The PCR products were prepared from the three family 
members and separated on a 5% polyacrylamide gel as 
described above. A DNA sample from a known heterozygous 
AF508 carrier is included for comparison, with reference 
to Figure 20, the banding pattern of the PCR-amplif ied 
genomic DNA from the father, who is the carrier of AI507, 
is clearly distinguishable from that of the mother, who 
is of the type of carriers with the AF508 mutation. In 
this gel electrophoresis test, there were actually three 
individuals (the carrier father and the two affected sons 
in Family 5) who carried the AI507 deletion. Since they 
all belong to the same family, they only represent one 
single CF chromosome in our population analysis [Kerem et 
al, aipxa] The two patients who also inherited the AF508 
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mutation from their mother showed typical symptoms of CF 
with pancreatic insufficiency. The father of this family 
was the only parent who carries this AI507 mutation; no 
other CF parents showed reduced hybridization intensity 
5 signal with the AF508 mutant oligonucleotide probe or a 
peculiar heteroduplex pattern for the PGR product (as 
defined above) in the retrospective study. In addition, 
two representatives of the group Illb and one of the 
group IIIc CF chromosomes from our collection [Kerem et 

10 al, suora 1 were sequenced, but none were found to contain 
AI507. Since the electrophoresis technique eliminates 
the need for probe-labelling and hybridization, it may 
prove to be the method of choice for detecting carriers 
in a large population scale [J. M. Rommens et al. Am, J, 

15 Hum. Genet . 46:395-396 (1990)]. 

The present data also indicate that there is a 
strict correlation between DNA marker haplotype and 
mutation in CF. The AF508 deletion is the most common CF 
mutation that occurred on a group la chromosome 

20 background [Kerem et al, supra ]. The AI507 mutation is, 
however, rare in the CF population; the one group Ilia CF 
chromosome carrying this deletion is the only example in 
our studied population (1/219) . Since the group III 
haplotype is relatively common among the normal 

25 chromosomes (17/198) , the AI507 deletion probably 
occurred recently. Additional studies with larger 
populations of different geographic and ethnic 
backgrounds should provide further insight in 
understanding the origins of these mutations. 

30 3^8 ADDITIONAL CF MOTATIOMB 

Following the above procedures, other mutations in 
the CF gene have been identified. The following brief 
description of each identified mutation is based on the 
previously described procedures for locating the mutation 
35 involving use of PCR procedures. The mutations are given 
short form names. The numbering used in these 
abbreviations refers to either the DNA sequence or the 



WO 91/10734 



PCT/CA91/00009 



75 



amino acid sequence position of the mutation depending on 
the type of mutation. For example, splice mutations and 
frameshift mutations are defined using the DNA sequence 
position. Most other mutations derive their nomenclature 
5 from the amino acid residue position. The description of 
each mutation clarifies the nomenclature in any event. 

For example, mutations G542X, Q493X, 3659 del c, 556 
del A result in shortened polypeptides significantly 
different from the single amino acid deletions or 
10 alteration. G542X and Q493X involve a polypeptide 

including on the first 541 and 493 amino acid residues, 
respectively, of the normal 1480 amino acid polypeptide. 
3659 del c and 556 del A also involve shortened versions 
and will include additional amino acid residues. 
15 Mutation 711+lG - T and 1717-1G -» A are predicted to lead 
to polypeptides which cannot be as of yet exactly 
defined. They probably do lead to shortened polypeptides 
but could contain additional amino acids. DNA sequence 
encoding these mutant polypeptides will now probely 
20 contain in&csm sequence from the normal gene or possible 
deleted exons. 

MUTATIONS TK gfflj , 
In the 129G - c mutation, there is a single basepair 
change of G to C at nucleotide 129 of the cDNA sequence 
25 of Figure l. The pcr product for amplifying genomic DNA 
containing this mutation is derived from the B115-B and 
10D primers as set out in Table 5. The genomic DNA is 
amplified as per the conditions of Table 6. 
3 '9'* MPTATiowa tm Bag a 
30 The G85E mutation in exon 3 involves a G to A 

transition at nucleotide position 386. it is detected in 
family #26, a French Canadian family classified as PI. 
This predicted Gly to Glu amino acid change is associated 
with a group lib haplotype. The mutation destroys a 
35 Hinfl site. The PCR product derived from the 3i-5 and 

3i-3 primers, as per conditions of Table 6, is cleaved by 
this enzyme into 3 fragment, 172, 105 and 32 bp, 
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respectively, for the normal sequence; a fragment of 277 
bp would be present for the mutant sequence. He analyzed 
54 CF chromosomes, 8 from group II, and 50 normal 
chromosomes, 44 from group II, and did not find another 
5 example of G85E. 

?.9t3 MUTATIONS IM ETON 4 

556 del A is a frameshift mutation in exon 4 in a 
single CF chromosome (Toronto family #17, GM1076) . There 
is a deletion of A at nucleotide position 556. This 

10 mutation is associated with Group Illb haplotype and is 
not found in 31 other CF chromosomes (9 from Illb) and 30 
N chromosomes (16 from Illb). The muation creates a Bgll 
1 enzyme cleavage site. The PCR primers are 4i-5 and 4i- 
3 (see Table 5) where the enzyme cuts the mutant PCR 

15 product (437 bp) into 2 fragments of 287 and 150 bp in 
size. 

The 114 8T mutation in exon 4 involves a T to c 
basepair transition at nucleotide position 575. This 
results in an lie to Thr change at amino acid position 
20 148 of Figure 1. The PCR product used in amplifying 
genomic dna containing this mutation uses primers 4i-5 
and 4i-3 as set out in Table 5. The reaction conditions 
for amplyfing the genomic DNA are set out in Table 6. 

2*&t2 KDTATIOM8 TW UToy g 

25 m mutation G178R the Gly to Arg missense mutation 

in exon 5 is due to a G to A change at nucleotide 
position 664. The mutation is found on the mother's CF 
chromosome in family #50; the other mutation in this 
family is AF508. Primers 5i-5 and 5i-3 were used for 

30 amplifying genomic DNA as outlined in Tables 5 and 6. 

3.8.4 MUTATIONS IK BXOM o 

A mutation in exon 9 is a change of alanine (GCG) to 
glutamic acid (GAG) at amino acid position 455 
(A455 E) . Two of the 38 non-AF508 CF chromosomes 
35 examined carries this mutation; both of them are from 
patients of a French-Canadian origin, which we have 
identified in our work as families #27 and #53, and they 
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belong to haplotype group lb. The mutation is detectable 
by allele-specific oligonucleotide (ASO) hybridization 
with PCR-amplified genomic DNA sequence. The PCR primers 
are 91-5 ( 5 ' -TAATGGATCATGGGCCATGT-3 ' ) and 9i-3 (5'- 
5 ACAGTGTTGAATGTGGTGCA— 3 ' ) for amplifying genomic DNA under 
the conditions of Table 6. The ASOs are 5'- 
GTTGTTGGCGGTTGCT-3 ' for the normal allele and 5'- 
GTTGTTGGAGGTTGCT— 3 ' for the mutant. The oligonucleotide 
hybridization is as described in Kerem et al (1989) supra 

10 at 37 «C and the washings are done twice with 5XSSC for 10 
min each at room temperature followed by twice with 2 X 
ssc for 30 min each at 52 'C. Although the alanine at 
position 455 (Ala455) is not present in all ATP-binding 
folds across species, it is present in all known members 

15 of the P-glycoprotein family, the protein most similar to 
CFTR. Further, A455 - E is believed to be a mutation 
rather than a sequence polymorphism because the change is 
not found in 16 non-AF508 CF chromosomes and three normal 
chromosomes carrying the same group I haplotype. 

20 2^5. MUTATIONS TN BKOM in 

In the Q493X mutation Gln493 (CAG) is changed into a 
stop codon (TAG) in Toronto family #9 (nucleotide 
position 1609 C - T) . The muation occurs on a CF 
chromosome with haplotype Illb; it is not found in 28 

25 normal chromosomes (15 of which belong to lib) nor in 33 
other CF chromosomes (5 of which Illb) . The mutation can 
be detected by allele-specific PCR, with loi-5 as the 
common PCR primer, 5 ' - GG CATAATCCAGG AAAACTG - 3 ' for the 
normal sequence and 5 ' -GGCATAATCCAGGAAAACTA-3 ' for the 

30 mutant allele. The PCR condition is 6 min at 94« 

followed by cycles of 30 sec at 94-, 30 sec at 57« and 90 
sec at 72 with 100 ng of each primer and -400 ng 
genomic DNA. The primers 9i-3 and 9i-5 may be used for 
internal PCR control as they share the same reaction 

35 condition. 

3 ' 8 ' 6 MUTATIONS IN EXOM U 
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In mutation G542X the glycine codon (G6A) at amino 
acid position 542 is changed to a stop codon (TGA) (G542 
■* Stop) . The single chromosome carrying this mutation is 
of Ashkenazic Jewish origin (family A) and has the B 
5 haplotype (XV2C allele 1; KM. 19 allele 2). The mutant 
sequence can be detected by hybridization analysis with 
allele-specific oligonucleotides (ASOs) on genomic DNA 
amplified under conditions of Table 6 by PCR with the 
lli-5 and lli-3 oligonucleotide primers. The normal ASO 

10 is 5 ' -ACCTTCTCCAAGAACT-3 ' and the mutant ASO, 5'- 

ACCTTCTCAAAGAACT-3 ' . The oligonucleotide hybridization 
condition is as described in Kerem et al (1989) supra and 
the washing conditions are twice in 5 x SSC for 10 min. 
each at room temperature followed by twice in 2 X SSC for 

15 30 min. each at 45°C. The mutation is not detected in 52 
other non-AF508 CF chromosomes, 11 of which are of Jewish 
origin (three have a B haplotype) , nor in 13 normal 
chromosomes. 

In mutation S549R, the highly conserved serine 

20 residue of the nucleotide binding domain at position 549 
is changed to arginine (S549 - R) ; the codon change is 
AGT -+ AGG. The CF chromosome with this mutation is 
carried by a non-Ashkenazic Jewish pateitn from Morocco 
(family B) . The chromosome also has the B haplotype. 

25 Detection of this mutation may be achieved by ASO 
hybridization or allele-specific PCR. In the ASO 
hybridization procedure, the genomic DNA sequence is 
first amplified under conditions of Table 6 by PCR with 
the lli-5 and lli-3 oligonucleotides; the ASO for the 

30 normal sequence is 5 9 -ACACTGAGTGGAGGTC-3 9 and that for 
the mutant is 5 ' -ACACTGAGGGGAGGTC . The oligonucleotide 
hybridization condition is as described by Kerem et al 
(1989) £HBEa and the washings are done twice in 5 x SSC 
for 10 min. each at room temperature followed by twice in 

35 2 x SSC for 30 min. eachat 56°C. In the allele-specific 
PCR amplification, the oligonucleotide primer for the 
normal sequence is 5'TGCTCGTTGACCTCCA-3' , that for the 
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mutant is 5'TGCTCGTTGACCTCCC-3 ' and that for the common, 
outside sequence is lli-5. The reaction is performed 
with 500 ng of genomic DNA, 100 ng of each of the 
oligonucleotides and 0.5 unit of Taq polymerase. The DNA 
5 template is first denatured by heating at 94 °c for 6 

min., followed by 30 cycles of 94° for 30 sec, 55« for 30 
sec and 72 • for 60 sec. The reaction is completed by a 6 
min heating at 72° for 7 min. This S549 - R mutation is 
not present in 52 other non-AF508 CF chromosomes, n of 

10 which are of Jewish origin (three have a B haplotype) , 
nor in 13 normal chromosomes. 

In the S549I mutation there is an AGT-+ATT change 
(nucleotide position 1778 G-T) which represent the third 
mutation involving this amino acid codon resulting in a 

15 loss of the Ddel site. We have only one example who is 
of Arabic origin and is sequenced; no other Ddel- 
resistant chromosome Is found in 5 other Arabic CF, 21 
Jewish CF, 41 Canadian CF, and 13 Canadian normal 
chromosomes . 

20 in mutation R560T the arginine (AAG) at amino acid 

position 560 is changed to threonine (AAC) . The 
individual carrying this mutation (R560 -> T) is from a 
family we have identified in our work as family #32 and 
the chromosome is marked by haplotype Illb. The mutation 

25 creates a MaeXI site which cleaves the PCR product of 
exon 11 (generated with primers lli-5 and iii-3 under 
conditions of Table 6) into two fragments of 214 and 204 
bp in size. None of the 36 non-AF508 CF chromosomes 
(seven of which have haplotype Illb) or 23 normal 

30 chromosomes (16 have haplotype Illb) carried this 

sequence alteration. The R560 - T mutation is also not 
present on eight CF chromosomes with the AF508 mutation. 

In mutation G551D glycine (G) at amino acid position 
551 is changed to aspartic acid (D) . G551 is a highly 

35 conserved residue within the ATP-binding fold. The 
corresponding codon change is from GGT to GAT. The 
G551-D change is found in 2 of our families (#1, #38) 
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with pancreatic insufficient (PI) CF patients and 1 
family (#54) with a pancreatic sufficient (PS) patient. 
The other CF chromosomes in family #1 and #38 carry the 
AF508 mutation and that in family #54 is unknown. Based 
on our "severe and mild mutation" hypothesis (Kerem et 
al. 1989) , this mutation is expected to be a "severe" 
one. All 3 chromosomes carrying this mutation belong to 
Group Illb. This G551-D substitution does not represent 
a sequence polymorphism because the change is not 
detected in 35 other CF chromosomes without the AF508 
deletion (5 of them from group Illb) and 19 normal 
chromosomes (including 5 from group Illb) . To detect 
this mutation, the genomic DNA region may be amplified 
under conditions of Table 6 by PCR with primers ni-5 
( 5 ' -CAACTGTGGTTAAAGCAATAGTGT-3 ' ) and lli-3 (5'- 
GCACAGATTCTGAGTAACCATAAT-3 ' ) and examined for the 
presence of a Mbol (Sau3A) site created by nucleotide 
change; the uncut (normal) form is 419 bp in length and 
the digestion products (from the mutant form) are 241 and 
178 bp. 

2*8*2 MUTATIONS IN E3COM la 

In the Y563N mutation a T to A change is detected at 
nucleotide position 1820 in exon 12. This switch would 
result in a change from Tyr to Asn at amino acid position 
563. It is found in a single family with 2 PS patients 
but the mutation in the other chromosome is unknown. We 
think Y563N is probably a missense mutation because (l) 
the T to A change is not found in 59 other CF 
chromosomes, with 8 having the same haplotype (Ila) and 
30 having AF508; and (2) this alteration is not found in 
54 normal chromosomes, with 39 having the lla haploytype. 
Unfortunately, the amino acid change is not drastic 
enough to permit a strong argument. This putative 
mutation can be detected by ASO hybridization with a 
normal (5 ' -AGCAGTATACAAAGATGC-3 ' ) and a mutant (5'- 
AGCAGTAAACAAAGATGC-3 ' ) oligonucleotide probe . The 
washing condition is 54 °C with 2xSSC. 
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In the P574H mutation the C at nucleotide position 
1853 is changed to A. Although the amino acid Pro at 
this position is not highly conserved across different 
ATP-binding folds, c change to His could be a drastic 
5 substitution. This change is not detected in 52 other CP 
chromosomes nor 15 normal chromosomes, 4 of which have 
the same group IV haplotype. Based on these arguments, 
we believe P574H is a mutation. To detect this putative 
mutation, one may use the following ASOs: 5'- 
10 GACTCTCCTTTTGGA-3 ' for the normal and 5 ' -GACTCTCATTTTGGA- 
3' for the mutant. Washing should be done at 47* in 
2xSSC. 

in the L1077P mutation, the T at nucleotide position 
3362 is changed to C. This results in a change of the 
15 amino acid Leu to Pro at amino position 1077 in Figure l. 
As with the other mutations in this exon, the genomic DNA 
is amplified by use of the primers of Table 5; namely 
17bi-5 and 17bi-3. The reaction conditions in amplifying 
the genomic DNA are set out in Table 6. 
20 The Y1092X mutation involves a change of c at 

nucleotide position 3408 to A. This would result in 
protein synthesis termination at amino position 1092. 
Hence the amino acid Tyr is not present in the truncated 
polypeptide. As with the above procedures, the primers 
used in amplifying this mutation are l7bi-3 and 17bi-3. 

3 ' 8 '8 MPTATIOMB IN jo 

3659 del C is a frameshift mutation in exon 19 in a 
single CP chromosome (Toronto family #2) ; deletion of c 
at nucleotide position 3659 or 3960; haplotype Ha; not 
present in 57 non-AF508 CF chromosomes (7 from Ha) and 
50 N chromosomes (43 from Ha) ; the deletion may be 
detected by PCR with a common oligonucleotide primer I9i- 
5 (see Table 5) and 2 ASO primers, HSC8 (5'- 
GTATGGTTTGGTTGACTT GG-3') for the' normal and HSC9 (5<- 
GTATGGTTTGGTTGACTTGT-3 ' ) for the mutant allele; the PCR 
condition is as usual except the annealing temperature is 
at 60 °C to improve specificity. 
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3.8.9 MUTATIONS IN INTRON 4 

In the 621 + 1G -♦ T mutation there is a single bp 
change affecting the splice site (GT TT) at the 3' end 
of exon 4; this mutation is detected in 5 French-CanaSian 
5 CF chromosomes (one each in Toronto families #22, 23, 26, 
36 and 53) but not in 33 other CF chromosomes (18 from 
the same group, group I) and 29 N chromosomes (13 from 
group I) ; the mutation creates a Msel site; genomic DNA 
may be amplified by the 2 intron primers, 4i-5 adn 4i-3, 
10 and cut with Msel to distinguish the normal and mutant 
alleles; the normal would give 4 fragments of 33, 35, 71 
and 298 bp in size; the 298 bp fragment in the mutant is 
cleaved by the enzyme to give a 54 and 244 bp fragments. 

?*8 t t0 MUTATION8 IN INTRON 5 

15 In the 711 + 1G -» T mutation this G to T switch 

occurs at the splice junction after exon 5. The mutation 
is found on the mother's CF chromosome in family #22, a 
French Canadian family from Chicoutimi. The other 
mutation in this family is 621+1G T. 

20 3,9,11 MUTATIONS IN TNTttON in 

In the 1717-iG a mutation a putative splice 
mutation is found in front of exon 11. This mutation is 
located at the last nucleotide of the intron before exon 
11. The mutation may be detected with the following 

25 ASO's: normal ■ 5 ' -TTTGGTAATAGGACATCTCC-3 ' ; mutant ASO » 
5 * -TTTGGTAATAAGACATCTCC-3 * . The washing conditions afer 
hybridization are SxSSC twice for 10 min at room temp, 
2xssc twice for 30 min at 47 • for the mutant and 2xssc 
twice to 30 min at 48° for the normal ASO. We have only 

30 1 single example from an Arabic patient and there is no 
haploytpe data. The mutation is not found in 5 other 
Arabic, 21 Jewish, and 41 Canadian CF chromosomes, nor in 
13 normal chromosomes. 

lift PNA SEQUENCE PM.YMO^pffI ffM ff 

35 Nucleotide position Amino acid change 

1540 (A or G) Met or Val 

1716 (G or A) no change (Glu) 

2694 (T or G) no change (Thr) 
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356 (G or A) Arg or Gin 

A polymorphism is detected at nucleotide position 1540- 
the A residue can be substituted by G, changing the 
corresponding amino acid from Met to Val. At postion 
5 2694- the T residue can be a G; although it does not 
change the encoded amino acid. The polymorphism may be 
detected by restriction enzymes Avail or Sau9Gl. These 
changes are present in the normal population and show 
good correlation with haploytpes but not in CP disease. 

10 There can be a G to A change for the last nucleotide 

of exon 10 (nucleotide position 1716) . We think that 
this nucleotide substitution is a sequence polymorphism 
because (a) it does not alter the amino acid, (b) it is 
unlikely to cause a splicing defect and (c) it occurs on 

15 some normal chromosomes, in two Canadian families, this 
rare allele is found associated with haplotype Illb. 

The more common mucleotide at 356 (G) is found to be 
changed to A in the father's normal chromosome in family 
#54. The amino acid changes from Arg to Gin. 

20 cftr protetk 

As discussed with respect to the DNA sequence of 
Figure 1, analysis of the sequence of the overlapping 
cDNA clones predicted an unprocessed polypeptide of 1480 
amino acids with a molecular mass of 168,138 daltons. As 

25 later described, due to polymorphisms in the protein, the 
molecular weight of the protein can vary due to possible 
substitutions or deletion of certain amino acids. The 
molecular weight will also change due to the addition of 
carbohydrate units to form a glycoprotein. It is also 

30 understood that the functional protein in the cell will 
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be similar to the unprocessed polypeptide,"' but may be 
modified due to cell metabolism. 

Accordingly, purified normal CFTR polypeptide is 
characterized by a molecular weight of about 170,000 
5 daltons and having epithelial cell transmembrane ion 

conductance activity. The normal CFTR polypeptide, which 
is substantially free of other human proteins, is encoded 
by the aforementioned DNA sequences and according to one 
embodiment, that of Figure 1. Such polypeptide displays 
10 the immunological or biological activity of normal CFTR 
polypeptide. As will be later discussed, the CFTR 
polypeptide and fragments thereof may be made by chemical 
or enzymatic peptide synthesis or expressed in an 
appropriate cultured cell system. The invention provides 
15 purified 507 mutant CFTR polypeptide which is 

characterized by cystic fibrosis-associated activity in 
human epithelial cells. Such 507 mutant CFTR 
polypeptide, as substantially free of other human 
proteins, can be encoded by the 507 mutant DMA sequence. 

20 4a STRUCTURE OP 

The most characteristic feature of the predicted 
protein is the presence of two repeated motifs, each of 
which consists of a set of amino acid residues capable of 
spanning the membrane several times followed by sequence 
resembling consensus nucleotide (ATP) -binding folds 
(NBFs) (Figures 11, 12 and 15). These characteristics 
are remarkably similar to those of the mammalian 
multidrug resistant P-glycoprotein and a number of other 
membrane-associated proteins, thus implying that the 
predicted CF gene product is likely to be involved in the 
transport of substances (ions) across the membrane and is 
probably a member of a membrane protein super family. 

Figure 13 is a schematic model of the predicted CFTR 
protein. In Figure 13, cylinders indicate membrane 
spanning helices, hatched spheres indicate NBFs. The 
stippled sphere is the polar R-domain. The 6 membrane 
spanning helices in each half of the molecule are 
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depicted as cylinders. The inner cytoplasmically 
oriented NBPs are shown as hatched spheres with slots to 
indicate the means of entry by the nucleotide. The large 
polar R-do»ain which links the two halves is represented 
5 by an stippled sphere, charged individual amino acids 
within the transmembrane segments and on the R-domain 
surface are depicted as small circles containing the 
charge sign. Net charges on the internal and external 
loops joining the membrane cylinders and on regions of 

10 the NBFs are contained in open sguares. sites for 

Phosphorylation by protein kinases A or C are shown by 
closed and open triangles respectively. K,R,H,D, and E 
are standard nomenclature for the amino acids, lysine, 
arginine, histidine, aspartic acid and glutamic acid 

15 respectively. 

*h JOT ° f Predlcted »*»*ra„e-associated regions of 
the CFTR protein consists of 6 highly hydrophobic 
segments capable of spanning a lipid bilayer according to 

20 ^Mor B f7 ° f Kyte ^ D °° little — °* ««rnier et al 
20 ( J, Wo l . Bi „i 120 , 97 (1978) (Pigure „ The ae 

associated regions are each followed by a large 
hydrophilic region containing the NBPs. Based on 
sequence alignment with other known nucleotide binding 
proteins, each of the putative NBFs in CFTR comprises at 
25 least 150 residues (Figure 13,. The 3 bp deletion at 
position 507 as detected in CP patients is located 

NBfTc™ 2 C ° nSerVed "»™* of the first 

NBF in CFTR. The amino acid sequence identity between 
the region surrounding the isoleucine deletion and the 
30 corresponding regions of a number of other proteins 
suggests that this region is of functional importance 
(Figure 15, . a hydrophobic amino acid, usually one with 
an aromatic side chain, is present in most of Less 
35 ™t ^ P ° 8ition ^responding to 1507 of the 
35 CFTR protein. It is understood that amino acid 

polymorphisms may exist as a result of DNA polymorphisms. 
Similarly, mutations at the other positions in th" 
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protein suggested that corresponding regions of the 
protein 'are also of functional importance. Such 
additional mutations include substitutions of: 

i) Glu for Gly at amino acid position 85; 

ii) Thr for lie at amino acid position 148; 

iii) Arg for Gly at amino acid position 178; 

iv) Glu for ALA at amino position 455; 

v) stop codon for Gin at amino acid postion 493; 

vi) stop codon for Gly at amino acid position 542; 

vii) Arg for Ser or lie for Ser at amino acid 
position 549; 

viii) Asp for Gly at amino acid position 551; 

ix) Thr for Arg at amino acid position 560; 

x) Asn for Tyr at amino acid position 563; 

xi) His for Pro at amino acid position 574; 

xii) Pro for Leu at amino acid position 1077; 

xiii) stop codon for Tyr at amino acid position 

1092. 

Figure 15 shows alignment of the 3 most conserved 
segments of the extended NBF's of CFTR with comparable 
regions of other proteins. These 3 segments consist of 
residues 433-473, 488-513, and 542-584 of the N-terminal 
half and 1219-1259, 1277-1302, and 1340-1382 of the C- 
terminal half of CFTR. The heavy overlining points out 
the regions of greatest similarity. Additional general 
homology can be seen even without the introduction of 
gaps. 

Despite the overall symmetry in the structure of the 
protein and the sequence conservation of the NBFs, 
sequence homology between the two halves of the predicted 
CFTR protein is modest. This is demonstrated in Figure 
12, where amino acids 1-1480 are represented on each 
axis. Lines on either side of the identity diagonal 
indicate the positions of internal similarities. 
Therefore, while four sets of internal sequence identity 
can be detected as shown in Figure 12, using the Dayhoff 
scoring matrix as applied by Lawrence et al. [c. B. 
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Lawrence, D. A. Goldman-, and R. T. Hood, Bull Math P 1 » ? , 
48, 569 (1986)], three of these are only apparent at low 
threshold settings for standard deviation. The strongest 
identity is between sequences at the carboxyl ends of the 
5 MBPs. of the 66 residues aligned 27% are identical and 
another 11% are functionally similar. The overall weak 
internal homology is in contrast to the much higher 
degree (>70%) in P-glycoprotein for which a gene 
duplication hypothesis has been proposed (Gros et al 
10 £211 47, 371, 1986, C. Chen et al, fifiU 47, 381, 1986, 
Gerlach et al, nature 324, 485, 1986, Gros et al, ^ 
C * U ' 8 ' 2770 < 19 *8). The lack of conservation in 

the relative positions of the exon-intron boundaries may 
argue against such a model for CFTR (Figure 2) . 

" * * inCS ther ° 18 apparentl y n ° signal-peptide sequence 
at the amino-terminus of CFTR, the highly charged 
hydrophilic segment preceding the first transmembrane 
sequence is probably oriented in the cytoplasm. Each of 
the 2 sets of hydrophobic helices are expected to form 3 

20 transversing loops across the membrane and little 

sequence of the entire protein is expected to be exposed 
to the exterior surface, except the region between 
transmembrane segment 7 and 8. it is of interest to note 

25 attSr regi ° n C ° ntainS tW ° **«n"*l sites for 

25 N- linked glycosylation. 

Each of the membrane-associated regions is followed 
by a NBF as indicated above, m addition, a highly 
charged cytoplasmic domain can be identified in the 
middle of the predicted CFTR polypeptide, linking the 2 
30 halves of the protein. This domain, named the Remain, 

69 8 of T ,°^ lly / ef ^ ^ 4 8ingle large — which 
69 of the 241 amino acids are polar residues arranged in 

alternating clusters of positive and negative charges 

Moreover, 9 of the lo consensus sequences required for 

^phosphorylation by protein kinase A (PKA) , and, 7 of 

f^undTc^ 8Ut T ate 8it6S f ° r Pr ° tein kinase c (^c, 
found in CFTR are located in this exon. 
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±±Z PPWCTTON OF CFTR 

Properties of CFTR can be derived from comparison to 
other membrane-associated proteins (Figure 15) . In 
addition to the overall structural similarity with the 
5 mammalian P-glycoprotein, each of the two predicted 

domains in CFTR also shows remarkable resemblance to the 
single domain structure of hemolysin B of E. coli and the 
product of the White gene of Drosophila. These latter 
proteins are involved in the transport of the lytic 

10 peptide of the hemolysin system and of eye pigment 
molecules, respectively. The vitamin B12 transport 
system of £2lir BtuD and HbpX which is a liverwort 
chloroplast gene whose function is unknown also have a 
similar structural motif. Furthermore, the CFTR protein 

15 shares structural similarity with several of the 

periplasmic solute transport systems of gram negative 
bacteria where the transmembrane region and the ATP- 
binding folds are contained in separate proteins which 
function in concert with a third substrate-binding 

20 polypeptide. 

The overall structural arrangement of the 
transmembrane domains in CFTR is similar to several 
cation channel proteins and some cation-translocating 
ATPases as well as the recently described adenylate 

25 cyclase of bovine brain. The functional significance of 
this topological classification, consisting of 6 
transmembrane domains, remains speculative. 

Short regions of sequence identity have also been 
detected between the putative transmembrane regions of 

30 CFTR and other membrane-spanning proteins. 

Interestingly, there are also sequences, 18 amino acids 
in length situated approximately 50 residues from the 
carboxyl terminus of CFTR and the raf serine/threonine 
kinase protooncogene of Xenonus lasxift which are 

35 identical at 12 of these positions. 

Finally, an amino acid sequence identity (10/13 
conserved residues) has been noted between a hydrophilic 
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segment (position 701-713) within the highly charged R- 
domain of CFTR and a region immediately preceding the 
first transmembrane loop of the sodium channels in both 
rat brain and eel. The charged R-domain of CFTR is not 
shared with the topologically closely related P- 
glycoprotein; the 241 amino acid linking-peptide is 
apparently the major difference between the two proteins. 

In summary, features of the primary structure of the 
CFTR protein indicate its possession of properties 
suitable to participation in the regulation and control 
of ion transport in the epithelial cells of tissues 
affected in CF. Secure attachment to the membrane in two 
regions serve to position its three major intracellular 
domains (nucleotide-binding folds 1 and 2 and the R- 
domain) near the cytoplasmic surface of the cell membrane 
where they can modulate ion movement through channels 
formed either by CFTR transmembrane segments themselves 
or by other membrane proteins. 

In view of the genetic data, the tissue-specificity, 
and the predicted properties of the CFTR protein, it is 
reasonable to conclude that CFTR is directly responsible 
for CF. it, however, remains unclear how CFTR is 
involved in the regulation of ion conductance across the 
apical membrane of epithelial cells. 

It is possible that CFTR serves as an ion channel 
itself. As depicted in Figure 13, lo of the 12 
transmembrane regions contain one or more amino acids 
with charged side chains, a property similar to the brain 
sodium channel and the GABA receptor chloride channel 
subunits, where charged residues are present in 4 of the 
6, and 3 of the 4, respective membrane-associated domains 
per subunlt or repeat unit. The amphipathic nature of 
these transmembrane segments is believed to contribute to 
the channel-forming capacity of these molecules. 
35 Alternatively, CFTR may not be an ion channel but instead 
serve to regulate ion channel activities, m support of 
the latter assumption, none of the purified polypeptides 
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from trachea and kidney that are capable of 
reconstituting chloride channels in lipid membranes 
[Landry et al, Science 224:1469 (1989)] appear to be CFTR 
if judged on the basis of the molecular mass. 
5 In either case, the presence of ATP-binding domains 

in CFTR suggests that ATP hydrolysis is directly involved 
and required for the transport function. The high 
density of phosphorylation sites for PKA and PKC and the 
clusters of charged residues in the R-domain may both 

10 serve to regulate this activity. The deletion of a 
phenylalanine residue in the NBF may prevent proper 
binding of ATP or the conformational change which this 
normally elicits and consequently result in the observed 
insensitivity to activation by PKA- or PKC-mediated 

15 phosphorylation of the CF apical chloride conductance 
pathway. Since the predicted protein contains several 
domains and belongs to a family of proteins which 
frequently function as parts of multi-component molecular 
systems, CFTR may also participate in epithelial tissue 

20 functions of activity or regulation not related to ion 
transport. 

With the isolated CF gene (cDNA) now in hand it is 
possible to define the basic biochemical defect in CF and 
to further elucidate the control of ion transport 

25 pathways in epithelial cells in general. Host important, 
knowledge gained thus far from the predicted structure of 
CFTR together with the additional information from 
studies of the protein itself provide a basis for the 
development of improved means of treatment of the 

JO disease. In such studies, antibodies have been raised to 
the CFTR protein as later described. 

5^0 CF fiCREEMTttfi 

IaI DMA BASED PI ASMOS Tfi 

Given the knowledge of the 85, 148, 178, 455, 493, 
15 507, 542, 549, 551, 560, 563, 574, 1077 and 1092 amino' 
acid position mutations and the nucleotide sequence 
varients at DNA sequence positions 129, 556, 621+1, 
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711+1, 1717-1 and 3659 as disclosed herein, carrier 
screening and prenatal diagnosis can be carried out as 
follows. 

The high risk population for cystic fibrosis is 
Caucasians. For example, each Caucasian woman and/or man 
of child-bearing age would be screened to determine if 
she or he was a carrier (approximately a 5% probability 
for each individual) . if both are carriers, they are a 
couple at risk for a cystic fibrosis child. Each child 
of the at risk couple has a 25% chance of being affected 
with cystic fibrosis. The procedure for determining 
carrier status using the probes disclosed herein is as 
follows. 

For purposes of brevity, the discussion on screening 
by use of one of the selected mutations is directed to 
the 1507 mutation, it is understood that screening can 
also be accomplished using one of the other mutations or 
using several of the mutations in a screening process or 
mutation detection process of this section on CF 
screening involving DNA diagnosis and mutation detection. 

One major application of the DNA sequence 
information of the normal and 507 mutant CF gene is in 
the area of genetic testing, carrier detection and 
prenatal diagnosis, individuals carrying mutations in 
the CF gene (disease carrier or patients) may be detected 
at the DNA level with the use of a variety of techniques. 
The genomic DNA used for the diagnosis may be obtained 
from body cells, such as those present in peripheral 
blood, urine, saliva, tissue biopsy, surgical specimen 
30 and autopsy material. The DNA may be used directly for 
detection of specific sequence or may be amplified 
enzymatically in v i tro by using PCR [Saiki et al. Science 
230: 1350-1353, (1985), Saiki et al. fiatUig 324: 163-166 
(1986)] prior to analysis, rna or its cDNA form may also 
35 be used for the same purpose. Recent reviews of this 
subject have been presented by Caskey, [Sslsnss 236: 
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1223-8 (1989) and by Landegren et al (Soignee 242: 229- 
237 (1989) ]. 

The detection of specific DNA sequence nay be 
achieved by methods such as hybridization using specific 
5 oligonucleotides [Wallace et al. Cold Spring Harbour 
Symp. Quant, Biol. 51: 257-261 (1986)], direct DNA 
sequencing [Church and Gilbert, Proc. Nat. Acad. Sci. u. 
S. A. 81: 1991-1995 (1988)], the use of restriction 
enzymes [Flavell et al. Cell 15: 25 (1978), Geever et al 

10 Proc. Nat. Acad. Sci. U. S. A. 78: 5081 (1981)], 

discrimination on the basis of electrophoretic mobility 
in gels with denaturing reagent (Myers and Maniatis, Cold 
Spring Harbour Svm. Quant. Biol. 51: 275-284 (1986)), 
RNase protection (Myers, R. M., Larin, J., and T. 

15 Maniatis Science 230: 1242 (1985)), chemical cleavage 
(Cotton et al Proc. Nat. Acad. S ci. U. S, A- 85: 4397- 
4401, (1985)) and the ligase-mediated detection procedure 
[Landegren et al Science 241:1077 (1988)]. 

Oligonucleotides specific to normal or mutant 

20 sequences are chemically synthesized using commercially 
available machines , labelled radioactively with isotopes 
(such as 32 P) or non-radioactively (with tags such as 
biotin (Ward and Langer et al. Proc. Nat. Acad, sci. u. 
s- A. 78: 6633-6657 (1981)), and hybridized to individual 

25 DNA samples immobilized on membranes or other solid 
supports by dot-blot or transfer from gels after 
electrophoresis. The presence or absence of these 
specific sequences are visualized by methods such as 
autoradiography or fluorometric (Landegren et al, 1989, 

30 supra) or color imetric reactions (Gebeyehu et a. Nucleic 
Acids Research 15: 4513-4534 (1987)). An embodiment of 
this oligonucleotide screening method has been applied in 
the detection of the 1507 deletion as described herein. 
Sequence differences between normal and mutants may 

35 be revealed by the direct DNA sequencing method of Church 
and Gilbert ( supra 1 . Cloned DNA segments may be used as 
probes to detect specific DNA segments. The sensitivity 
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gel electrophoresis, as have been detected for the 3 bp 
(1507) nutation and in other experimental systems 
[Nagamine et al, Am. J. Hum. Genet, 45:337-339 (1989)], 
5 Alternatively, a method of detecting a mutation 

comprising a single base substitution or other small 
change could be based on differential primer length in a 
PGR. For example, one invariant primer could be used in 
addition to a primer specific for a mutation. The PCR 

10 products of the normal and mutant genes can then be 
differentially detected in acrylamide gels. 

Sequence changes at specific locations may also be 
revealed by nuclease protection assays, such as RNase 
(Myers, pupra) and Si protection (Berk, A. J., and P. A. 

15 Sharpe Proc. Nat. Acad. Sci. U. S, A. 75: 1274 (1978)), 
the chemical cleavage method (Cotton, supra ) or the 
ligase-mediated detection procedure (Landegren supra ) . 

In addition to conventional gel-electrophoresis and 
blot-hybridization methods, DNA fragments may also be 

20 visualized by methods where the individual DNA samples 
are not immobilized on membranes. The probe and target 
sequences may be both in solution or the probe sequence 
may be immobilized [Saiki et al, Proc. Natl . Acad. srH 
USh, 86:6230-6234 (1989)]. A variety of detection 

25 methods, such as autoradiography involving radioisotopes, 
direct detection of radioactive decay (in the presence or 
absence of scintillant) , spectrophotometry involving 
colorigenic reactions and f luorometry involving 
fluorogenic reactions, may be used to identify specific 

30 individual genotypes. 

Since more than one mutation is anticipated in the 
CF gene such as 1507 and F508, a multiples system is an 
ideal protocol for screening CF carriers and detection of 
specific mutations. For example, a PCR with multiple, 

35 specific oligonucleotide primers and hybridization 

probes, may be used to identify all possible mutations at 
the same time (Chamberlain et al. Nucleic Acids r^a^h 
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16: 1141-1155 (1988)). The procedure may involve 
immobilized sequence-specific oligonucleotides probes 
(Saiki et al, £ypra) . 

U DETECTING THE CP 507 MPTMPIOM 

5 These detection methods may be applied to prenatal 

diagnosis using amniotic fluid cells, chorionic villi 
biopsy or sorting fetal cells from maternal circulation. 
The test for CF carriers in the population may be 
incorporated as an essential component in a broad-scale 

10 genetic testing program for common diseases. 

According to an embodiment of the invention, the 
portion of the DNA segment that is informative for a 
mutation, such as the mutation according to this 
embodiment, that is, the portion that immediately 

15 surrounds the 1507 deletion, can then be amplified by 
using standard PCR techniques [as reviewed in Landegren, 
Ulf, Robert Kaiser, C. Thomas Caskey, and Leroy Hood, DNA 
Diagnostics - Molecular Techniques and Automation, in 
Science 242: 229-237 (1988)J. it is contemplated that 

20 the portion of the DNA segment which is used may be a 
single DNA segment or a mixture of different DNA 
segments. A detailed description of this technique now 
follows. 

A specific region of genomic DNA from the person or 
25 fetus is to be screened. Such specific region is defined 
by the oligonucleotide primers C16B 
( 5 ' GTTTTCCTGGATTATGCCTGGCAC3 ' ) and C16D 

( 5 ' GTTGGCATGCTTTGATGACGCTTC3 ' ) or as shown in Figure 18 
by primers 101-5 and 101-3. The specific regions using 

30 l0i-5 and 10i-3 were amplified by the polymerase chain 
reaction (PCR) . 200-400 ng of genomic DNA, from either 
cultured lymphoblasts or peripheral blood samples of CF 
individuals and their parents, were used in each PCR with 
the oligonucleotides primers indicated above. The 

35 oligonucleotides were purified with Oligonucleotide 

Purification Cartridges'" (Applied Biosystems) or NENS0RB™ 
PREP columns (Dupont) with procedures recommended by the 
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suppliers. The primers were annealed at 55*C for 30 sec, 
extended at 72*C for 60 sec (with 2 units of Taq DNA 
polymerase) and denatured at 94*C for 60 sec, for 30 
cycles with a final cycle of 7 min for extension in a 
Perkin-Elmer/Cetus automatic thermocycler with a step- 
Cycle program (transition setting at 1.5 min). Portions 
of the PCR products were separated by electrophoresis on 
1.4% agarose gels, transferred to Zetabind™; (Biorad) 
membrane according to standard procedures. 

The normal and AI507 oligonucleotide probes of 
Figure 19 (10 ng each) are labeled separately with 10 
units of T4 polynucleotide kinase (Pharmacia) in a 10 /il 
reaction containing 50 mM Tris-HCl (pH7.6), 10 mM MgCl 2 , 
0.5 mM dithiothreitol, 10 mM spermidine, 1 mM EDTA and ' 
15 30-40 M Ci Of 7 [*P] - ATP for 20-30 min at 37-C. The 
unincorporated radionucleotides were removed with a 
Sephadex G-25 column before use. The hybridization 
conditions were as described previously (J.M. Rommens et 
al AMt Jt HVTB. genet . 43,645 (1988)) except that the 
20 temperature can be 37*C. The membranes are washed twice 
at room temperature with 5xSSC and twice at 39«c with 2 
SSC (1 x SSC m 150 mM Nad and 15 mM Na citrate) . 
Autoradiography is performed at room temperature 
overnight. Autoradiographs are developed to show the 
25 hybridization results of genomic DNA with the 2 specific 
oligonucleotide probes. Probe C normal detects the 
normal DNA sequence and Probe C AI507 detects the mutant 
sequence. 

Genomic DNA sample from each family member can, as 
explained, be amplified by the polymerase chain reaction 
using the intron sequences of Figure 18 and the products 
separated by electrophoresis on a 1.4% agarose gel and 
then transferred to Zetabind (Biorad) membrane according 
to standard procedures. The 3bp deletion of AI507 can be 
revealed by a very convenient polyacrylamide gel 
electrophoresis procedure. When the PCR products 
generated by the above-mentioned l0i-5 and ioi-3 primers 
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are applied to an 5% polyacryiamide gel, electrophoresed 
for 3 hrs at 20V/cm in a 90mM Tris-borate buffer (pH 
8.3), DMA fragments of a different mobility are clearly 
detectable for individuals without the 3 bp deletion, 
5 heterozygous or homozygous for the deletion. 

As already explained with respect to Figure 20, the 
PCR amplified genomic DNA can be subjected to gel 
electrophoresis to identify the 3 bp deletion. As shown 
in Figure 20, in the four lanes the first lane is a 
10 control with a normal/AF508 deletion. The next lane is 
the father with a normal/Al507 deletion. The third lane 
is the mother with a normal/AF508 deletion and the fourth 
lane is the child with a AF508/AI507 deletion. The 
homoduplexes show up as solid bands across the base of 
15 each lane, m lanes 1 and 3, the two heteroduplexes show 
up very clealy as two spaced apart bands, in lane 2, the 
father's AI507 mutation shows up very clearly, whereas in 
the fourth lane, the child with the adjacent 507, 508 
mutations, there is no distinguishable heteroduplexes. 
Hence the showing is at the homoduplex line, since the 
father in lane 2 and the mother in lane 3 show 
heteroduplex banding and the child does not, indicates 
either the child is normal or is a patient. This can be 
futher checked if needed, such as in embryoic analysis by 
mixing the 507 and 508 probes to determine the presence 
of the AI507 and AF508 mutations. 

Similar alteration in gel mobility for 
heteroduplexes formed during PCR has also been reported 
for experimental systems where small deletions are 
30 involved (Nagamine et al . These BObllity shifts 

may be used in general as the basis for the non- 
radioactive genetic screening tests. 

It is appreciated that approximately 1% of the 
carriers can be detected using the specific aI507 probes 
of this particular embodiment of the invention. Thus, if 
an individual tested is not a carrier using the AI507 
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probes, their carrier status can not be excluded, they 
may carry some other mutation, such as the AF508 as 
previously noted. However, if both the individual and 
the spouse of the individual tested are a carrier for the 
5 AI507 mutation, it can be stated with certainty that they 
are an at risk couple. The sequence of the gene as 
disclosed herein is an essential prerequisite for the 
determination of the other mutations. 

Prenatal diagnosis is a logical extension of carrier 
10 screening. A couple can be identified as at risk for 
having a cystic fibrosis child in one of two ways: if 
they already have a cystic fibrosis child, they are both, 
by definition, obligate carriers of the defective CFTR 
gene, and each subsequent child has a 25% chance of being 
15 affected with cystic fibrosis. A major advantage of the 
present invention eliminates the need for family pedigree 
analysis, whereas, according to this invention, a gene 
mutation screening program as outlined above or other 
similar method can be used to identify a genetic mutation 
20 that leads to a protein with altered function. This is 
not dependent on prior ascertainment of the family 
through an affected child. Fetal DNA samples, for 
example, can be obtained, as previously mentioned, from 
amniotic fluid cells and chorionic villi specimens. 
25 Amplification by standard PCR techniques can then be 
performed on this template DNA. 

If both parents are shown to be carriers with the 
AI507 deletion, the interpretation of the results would 
be the following, if there is hybridization of the fetal 
30 DNA to the normal probe, the fetus will not be affected 
with cystic fibrosis, although it may be a CF carrier 
(50% probability for each fetus of an at risk couple) . if 
the fetal DNA hybridizes only to the AI507 deletion probe 
and not to the normal probe, the fetus will be affected 
35 with cystic fibrosis. 

It is appreciated, that for this and other mutations 
in the CF gene, a range of different specific procedures 
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can be used to provide a complete diagnosis for all 
potential CF carriers or patients. A complete 
description of these procedures is later described 

The invention therefore provides a method and 'kit 
5 for determining if a subject is a CP carrier or CP 

steps n of= ^ " aa " n '' SCreenl "9 " eth <* comprises the 
providing a biological sample of the subject to be 

10 IZTT\"^ Pr ° Vialn ' *" ">*y f°r detecting in the 

10 biological sample, the presence of at least a me«,er from 

the group consisting of a 5 07 mutant cr gene, 507 mutant 

CF gene products and mixtures thereof 

at l.»r " eth ° a ^ be fUrther Char »««t«d by including 
« ™, °" e "° re nucleoti »« Probe which is a different 

» DNA serenes fragment of, for example, the DNA of Figure 
1, or a different DNA sequence fragment of human 
chromosome 7 and located to either side of the DNA 
sequence of Figure l. ln this respect, the dna fragments 

2C con^rmi °" POrtl ° nS °' Fi9Ure 2 »* ^.r 
confxrming the presence of the mutation. Unique .spec" 

In scree" " ^ """ »* * "lied Cn 

in screening procedures to further confirm the presence 

isr* tim at - 1507 — - — — 
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suitable^' iCC ° r " n9 to - of the invention, 

suitable for use in the screening technique and for 
assaying for the presence of the mutant CF gen. 2 „ 
immunoassay comprises: 

orodu^' " ntlb0<Jy WhlCh =P«<=i«cally binds to a gene 

IITT.,1, " Utant C " 9Me haVi "« * station at one of 
the positions of 85, 148 i7n ark 

c:k-i ' ' 455 ' 493 ' s °7» 542. 549 

551, 560, 563, 574, 1077 and 1092; ' 

~£ "^r^ctrr ta9 tte m — - - 

35 =- '!\ antibod S' aM "agent means each being 
Present in arount3 etfective t . ^ 
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The kit for assaying for the presence for the mutant 
CF gene may also be provided by hvb,H t mutant 
The kit comprises: hybridi 2a tion techniques. 

5 binds ( ll ,h" OUg ° nUCleoti ^ P^be which specifically 
binds to the mutant CP gene having a mutation at one of 
the positions 85, 148 i7« auk 

s„. „ 4 . x;„ 4 l":; 2 ; S5 ' 493 - s «< »«- 

(b) reagent means for detecting the hvfaiH 

^ " rr »°""° - r n-™ v-rn n- ^"^'ation "-ay- 

»s mentioned, antibodies to epitope, within 
-unt cftr protein .t poeitio™, as. °L " 8 " _ 

15 so ]- s « «. "o. 5 «, 574 , J is, "re - 
raised to provide extensive information ofth" 

.lt.red""e. Sy " th,,Sia ° f • -**. « 

2. Antibodies to distinct domains of the nutant 
protein can be used to detersine the topot^T 
arren 9 e.ent of the protein i, tt . cell 

iBI ° rmtim m -~ « therein 
— T iCCMSme to «t«n.ilv added noduLtiL 
•gents for purposes of drug therapy. 

Ltlf* "'""""-"""tion relationships of 
?Zt °7 J" 6 Pr ° tein Mn to —1— «ing 

r Charged ^T^ZlTZ^ " 
transmembrane sequences as won 

« ~de bindC^irr;™?" 

p.r.L"r °' " ntlb0diM - <»=«onal 

Parameter, of the protein provide insight into oe U 
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regulatory mechanisms and potentially suggest means 
of modulating the activity of the defective protein 
in a CP patient. 

4. Antibodies with the appropriate avidity also 
enable immunoprecipitation and immuno-affinity 
purification of the protein. Immunoprecipitation 
will facilitate characterization of synthesis and 
post translational modification including ATP 
binding and phosphorylation. Purification will be 
required for studies of protein structure and for 
reconstitution of its function, as well as protein 
based . therapy. 

In order to prepare the antibodies, fusion proteins 
containing defined portions of anyone of the mutant CFTR 
polypeptides can be synthesized in bacteria by expression 
of corresponding mutant DNA sequence in a suitable 
cloning vehicle. Smaller peptide may be synthesized 
chemically. The fusion proteins can be purified, for 
example, by affinity chromatography on glutathione- 
agarose and the peptides coupled to a carrier protein 
(hemocyanin) , mixed with Freund's adjuvant and injected 
into rabbits. Following booster injections at bi-weekly 
intervals, the rabbits are bled and sera isolated. The 
developed polyclonal antibodies in the sera may then be 
25 combined with the fusion proteins, immunoblots are then 
formed by staining with, for example, allcaline- 
Phosphatase conjugated second antibody in accordance with 
the procedure of Blake et al, Anal. BjsshflB 136:175 
(1984)* 

Thus, it is possible to raise polyclonal antibodies 
specific for both fusion proteins containing portions of 
the mutant CFTR protein and peptides corresponding to 
short segments of its sequence. Similarly, mice can be 
injected with KLH conjugates of peptides to initiate the 
production of monoclonal antibodies to corresponding 
segments of mutant CFTR protein. 
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As for the generation of monoclonal antibodies, 
^unogens for the raising of monoclonal antibodies 
(mAbs to the mutant CFTR protein are bacterial fusion 

S ZTiZ 'TIT ^ ^ ^ 67S31 (1988)3 raining 
5 portions of the CFTR polypeptide or synthetic peptides 
corresponding to short (i 2 to 25 amino acids ^ leng^, 
segments of the mutant sequence. The essential 
;nS. iS ^ ° f K ° hl - - «^tein [m 2S6: 

10 Balb/c mice are immunized by intraperitoneal 

injection with 500 „g of pure fusion protein or syntheti. 
Peptide in incomplete Preund^s adjuvant, t sZZ 
miction is given after 14 days, a third after 21 days 

15 i a l nl f T h aftGr 28 day8 ' ^idual animals so " 
15 immunized are sacrificed one, two and four weeKs 

following the final injection, spleens are removed 
their cells dissociated, collected and fused wito L, G 
Agl4 myeloma cells according to Gefter Til s T 

20 distributed in culture medium selective toTZl 
propagation of fused call. whl<-i. ... 

about 25% confluant 11 !? 9r °" n " ntil thoy «*• 

ar. tastad f or !T "**' """"^ ™P«"«,„t. 

are tastad for the prsssncs of antlbodias rsaotitw, ..<..,. 

p^ioular cpir antlgan. *n . ltoll „. pnosrna^T ' 

*rouoto,r.p hy on Protain G- or Protain A-.garos. 
according to Ey at .1, 1^,^.,..^ 1S!4 ,T7£ 77 , 

Protain can ba confirmed by poiyacryla^ld."* 
slaotroph.rs.is of ^.rana. iaoX.t.a fro. spitnaliai 
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cells in which it is expressed and immunoblotted [Tovbin 
et al, Proe. Natl. Ae»d. fi^j , y ffft 76:4350 (1979)]. 

In addition to the use of monoclonal antibodies 
specific for the particular mutant domain of the CFTR 
5 protein to probe their individual functions, other mAbs, 
which can distinguish between the normal and mutant forms 
of CFTR protein, are used to detect the mutant protein in 
epithelial cell samples obtained from patients, such as 
nasal mucosa biopsy "brushings" [ R. De-Lough and J. 

10 Rutland, J. ciin. P^hM, 42 , 613 (1989)) or skin biopsy 
specimens containing sweat glands. 

Antibodies capable of this distinction are obtained 
by differentially screening hybridomas from paired sets 
of mice immunized with a peptide containing, for example, 

15 the isoleucine at amino acid position 507 (e.g. 

GTIKENIIFGVSY) or a peptide which is identical except for 
the absence of 1507 (GTIKENIFGVSY) . mAbs capable of 
recognizing the other mutant forms of CFTR protein 
present in patients in addition or instead of 1507 

20 deletion are obtained using similar monoclonal antibody 
production strategies. 

Antibodies to normal and CF versions of CFTR protein 
and of segments thereof are used in diagnostically 
immunocytochemical and immunofluorescence light 

25 microscopy and immunoelectron microscopy to demonstrate 
the tissue, cellular and subcellular distribution of CFTR 
within the organs of CF patients, carriers and non-CF 
individuals. 

Antibodies are used to therapeutically modulate by 
30 promoting the activity of the CFTR protein in CF patients 
and in cells of CF patients. Possible modes of such 
modulation might involve stimulation due to cross-linking 
of CFTR protein molecules with multivalent antibodies in 
analogy with stimulation of some cell surface membrane 
35 receptors, such as the insulin receptor [O'Brien et al. 
Euro. Mol. B l olt f?rqnn . iT , 6:4003 (1987)], epidermal 
growth factor receptor [Schreiber et al, J. Biol, ph^ , 



WO 91/10734 



PCT/CA91/00009 



104 



10 



15 



20 



25 



30 



35 



258:846 (1983)] and T-cell receptor-associated molecules 
such as CD4 [Veillette et al fiatiae., 338:257 (1989)]. 

Antibodies are used to direct the delivery of 
therapeutic agents to the cells which express defective 
CFTR protein in CP. For this purpose, the antibodies are 
incorporated into a vehicle such as a liposome [Matthay 
et al. Cancer Re ff . 46:4904 (1986)] which carries the 
therapeutic agent such as a drug or the normal gene 

STOP ftNAfrYPIff 

DNA diagnosis is currently being used to assess 
whether a fetus will be born with cystic fibrosis, but 
historically this has only been done after a particular 
set of parents has already had one cystic fibrosis child 
which identifies them as obligate carriers. However, in 
combination with carrier detection as outlined above,' DNA 
diagnosis for all pregnancies of carrier couples will be 
possible, if the parents have already had a cystic 
fibrosis child, an extended haplotype analysis can be 
done on the fetus and thus the percentage of false 
positive or false negative will be greatly reduced, if 
the parents have not already had an affected child and 
the DNA diagnosis on the fetus is being performed on the 
basis of carrier detection, haplotype analysis can still 
be performed. 

Although it has been thought for many years that 
there is a great deal of clinical heterogeneity in the 
cystic fibrosis disease, it is now emerging that there 
are two general categories, called pancreatic sufficiency 
(CF-PS) and pancreatic insufficiency (CF-Pl) . if the 
mutations related to these disease categories are well 
characterized, one can associate a particular mutation 
with a clinical phenotype of the disease. This allows 
changes in the treatment of each patient. Thus the 
nature of the mutation will to a certain extent predict 
the prognosis of the patient and indicate a specific 
treatment. 
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MOLECULAR BIot.onv gg ^ taTTC »raffi ffTff 
The postulate that CFTR may regulate the activity of 
ion channels, particularly the outwardly rectifying CI 
channel implicated as the functional defect in cf, can be 
tested by the injection and translation of full length in 
Yltca transcribed CFTR mRNA in Xenopus oocytes. The 
ensuing changes in ion currents across the oocyte 
membrane can be measured as the potential is clamped at a 
fixed value. CFTR may regulate endogenous oocyte 
channels or it may be necessary to also introduce 
epithelial cell rna to direct the translation of channel 
proteins. Use of mRNA coding for normal and for mutant 
cftr, as provided by this invention, makes these 
experiments possible. 
15 other modes of expression in heterologous cell 

system also facilitate dissection of structure-function 
relationships. The complete CFTR ONA sequence ligated 
into a plasmid expression vector is used to transfect 
cells so that its influence on ion transport can be 
20 assessed. Plasmid expression vectors containing part of 
the normal CFTR sequence along with portions of modified 
sequence at selected sites can be used in vitro 
mutagenesis experiments performed in order to identify 
those portions of the cftr protein which are crucial for 
25 regulatory function. 
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The mutant dna sequence can be manipulated in 

*?*tll t0 ^ eXP " 88ion <* «" *ene and its 

product, and, to achieve production of large quantities 
of the protein for functional analysis, antibody 
production, and patient therapy. The changes in the 
sequence may or may not alter the expression pattern in 
terms of relative quantities, tissue-specificity and 
functional properties. The partial or full-length cDNA 
sequences, which encode for the subject protein 
unmodified or modified, may be ligated to bacterial 
expression vectors such as the P rit (Nilsson et al. 
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4: 1075-1080 (1985)), pGEX (Smith and Johnson, Sens 
67: 31-40 (1988)) or pATH (Spindler et al. J. vi^oi . 49: 
132-141 (1984)) plasmids which can be introduced into £. 
£311 cells for production of the corresponding proteins* 
5 which may be isolated in accordance with the previously 
discussed protein purification procedures. The DNA 
sequence can also be transferred from its existing 
context to other cloning vehicles, such as other 
plasmids, bacteriophages, cosmids, animal virus, yeast 

10 artificial chromosomes (YAC) (Burke et al. SffitflDSa 236: 
806-812, (1987)), somatic cells, and other simple or 
complex organisms, such as bacteria, fungi (Timber lake 
and Marshall, Scianea 244: 1313-1317 (1989), 
invertebrates, plants (Gasser and Praley, gfiifinsa 244- 

15 1293 (1989), and pigs (Pursel et al. gcisnsg 244: 1281- 
1288 (1989)). 

For expression in mammalian cells, the cDNA sequence 
may be ligated to heterologous promoters, such as the 
simian virus (SV) 40, promoter in the P SV2 vector 

20 (Mulligan and Berg, Proc. Nat! . »rad. ggj frc ft 78:2072 - 
2076 (1981)] and introduced into cells, such as monkey 
COS-l cells (Gluzman, Call, 23:175-182 (1981)], to 
achieve transient or long-term expression. The stable 
integration of the chimeric gene construct may be 

25 maintained in mammalian cells by biochemical selection, 
such as neomycin (Southern and Berg, J. Mo i . a » r i n , 
Ssnfifc* 1:327-341 (1982)] and mycophoenolic acid [Mulligan 
and Berg, supra ]. 

DNA sequences can be manipulated with standard 
procedures such as restriction enzyme digestion, fill-in 
with DNA polymerase, deletion by exonuclease, extension 
by terminal deoxynucleotide transferase, ligation of 
synthetic or cloned DNA sequences, site-directed 
sequence-alteration via single-stranded bacteriophage 
35 intermediate or with the use of specific oligonucleotides 
in combination with PCR, 
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The CDMA saguence (or portions derivad troa It) or 

IT t T " *"* ^ *" inteon « ^ own pro.!;." 
i. introduced into eukaryotic axpreaslon vector by 
oonvontional techniques. Then vectors are d.. lm - . 
* permit th. transcription or too coKa" a^t eeUs 
by Providing rcguistory ssguanc that init^a ,„ d 
anhanoa th. tranaoription or th. cDNA and ensure u. 
Proper spiicin, „ a poiysd.nyl.ti.,,. vctors^Ln,, 

1 Gorman et al Proe nam 

CFTR endo,.nou. pronotar nay ba uaad. Tn . level *' 
expression of the cOKA oan b. aanipul.tad with this tvo. 
of veotor, aithar by using promoters that h.™ 
actlvitia. (f0 r exaapia. the baeulovirut ^73 oan 
express cDNAs at high laval. i„ a . , 73 <Mln 

Viruses and tt. i. ' GM » ti ««r *"erad 

viruses and the Environaent (B. Plaids, et ,i, ^ . ,_, 

« no ,!,-„„, Cola spring Barbour Laboratory^ ^ 

» gxucocorLco^ Lt:^r; h r: — * tha 

tueor virus tI « at al . «, ST,^^",^ 

^ (-anaiant 

" „*" '" Wltion ' «»• contain salactable narkara 

such as th. aat tMuiligan et Bar, supra) or ™T 
CSoutharn and Berg » pp , n ^ J « ~ 

b.ct.ri.1 gana. that parmlt iaolation of ceil. b v 

-ciplant c.u. T h. ^r^t^Tth 1 ." ^ 
=.U. .. apisonal, freely repeating antitias By using 
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regulatory elements of viruses such as papilloma [Sarver 
et al Holt Cell Biol, 1:486 (1981)] or Epstein-Barr 
(Sugden et al Mol. can m^, 5:410 (1985)]. 
Alternatively, one can also produce cell lines that have 
5 integrated the vector into genomic DNA. Both of these 
types of cell lines produce the gene product on a 
continuous basis. One can also produce cell lines that 
have amplified the number of copies of the vector (and 
therefore of the cDNA as well) to create cell lines that 
10 can produce high levels of the gene product [Alt et al. 

■7t Pjolt ChfiTO, 253: 1357 (1978)). 

The transfer of DNA into eukaryotic, in particular 
human or other mammalian cells is now a conventional 
technique. The vectors are introduced into the recipient 
15 cells as pure DNA (transfection) by, for example, 

precipitation with calcium phosphate [Graham and vander 
Eb, Yiraiagy 52:466 (1973) or strontium phosphate [Brash 
et al MPl, Cell Plol, 7:2013 (1987)], electroporation 
[Neumann et al EffiQj i !84 i (1982)], lipofection [Feigner 

20 et al Pra? Watl, toad, Sd TTffft 84:7413 (1987)], DEAE 
dextran [McCuthan et al J. Wa *i TmTt 
1968)], microinjection [Mueller et al Cell 15:579 1978)] 
protoplast fusion [Schafner, Proc N»f.i B<H n c ft 

72:2163] or pellet guns [Klein et al, 327: 70 

25 (1987)]. Alternatively, the cDNA can be introduced by 

infection with virus vectors. Systems are developed that 
use, for example, retroviruses [Bernstein et al. GeneHr 
Engineer i ng 7: 235, (1985)], adenoviruses [Ahmad et al j. 
Eiial 57:267 (1986)] or Herpes virus [Spaete et al Cell 
30 30:295 (1982)]. 

These eukaryotic expression systems can be used for 
many studies of the mutant CP gene and the mutant CFTR 
product, such as at protein positions 85, 148, 178, 455 
493, 507, 542, 549, 551, 560, 563, 574, 1077 and 1092. ' 
35 These include, for example: (i) determination that the 
gene is properly expressed and that all post- 
radiational modifications necessary for full biological 
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rectory . lM(mta locatM ^ ^ J( ^ 

°' dru,s. (5 , study the Action of ^ "T 

protein T*** * Proouoed JL Bt 

proteins. Neturally eocurring »ta,t proteins «£t i„ 

Mer species or non-mammalian M n_ 
desired. The choice cf ceil is dJ^i - 
of the treatment. For ex^ deterB,ined * P«*P°se 

cans C oiu ZM „, 23 ^t :; 8 r c : h n r * c ° s 

25 levels cf the sWt anti-L ! Pr0dUOe high 

of vectors conta ^^rLET^ 

fibroblasts or lymphoblasts. 

invent'^^^ — * 

sequences cf this W^ticTfcr "* °' ^ ^ 

35 host. The DM a «. lnVentlon for expression in a suitable 
lf Tne DNA is operatively linked «. K 
expression control . sequence L 1? VeCt ° r to an 

molecule so that no^T^L recombinant DNA 

so that normal CFTR polypeptide can be 
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expressed. The expression control sequence may be 
selected from the group consisting of sequences that 
control the expression of genes of prokaryotic or 
eukaryotic cells and their viruses and combinations 
5 thereof. The expression control sequence may be 

specifically selected from the group consisting of the 
lfl£ system, the £rj2 system, the £as system, the trc 
system, major operator and promoter regions of phage 
lambda, the control region of fd coat protein, the early 
10 and late promoters of SV40, promoters derived from 

polyoma, adenovirus, retrovirus, baculovirus and simian 
virus, the promoter for 3-phosphoglycerate kinase, the 
promoters of yeast acid phosphatase, the promoter of the 
yeast alpha-mating factors and combinations thereof. 

The host cell, which may be transfected with the 
vector of this invention, may be selected from the group 

consisting of sell, is^Hflamanaa, Basiling sufetiiis, 

Bacillus Staa3aJaiSJ3ngBriilaa or other bacili; other 
bacteria; yeast; fungi? insect; mouse or other animal; 
or plant hosts; or human tissue cells. 

It is appreciated that for the mutant DNA sequence 
similar systems are employed to express and produce the 
mutant product. 

PBPTBIM POKCTIOK fln^ TDEMTTfl rp 

To study the function of the mutant CPTR protein, it 
is preferable to use epithelial cells as recipients, 
since proper functional expression may require the 
presence of other pathways or gene products that are only 
expressed in such cells. Cells that can be used include, 
for example, human epithelial cell lines such as T84 
(ATCC #CRL 248) or PANC-1 (ATCC # CLL 1469) , or the T43 
immortalized CF nasal epithelium cell line [Jettan et al 
5^ (1989H and primary [Yanhoskes et al. Ann, Rev, ' 
Besa^is. 132: 128I (1985)) or transformed [Scholte et 
al. EW. c»n. BS&, 182: 559(1989)) human nasal polyp or 
airways cells, pancreatic cells [Harris and Coleman i. 

*° U 87 : 695 ("87)], or sweat gland cells [Collie 
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et al. In Vitro 21: 597 (1985)] derived from normal or CF 
subjects. The CF cells can be used to test for the 
functional activity of mutant CF genes. Current 
functional assays available include the study of the 
movement of anions (CI or I) across cell membranes as a 
function of stimulation of cells by agents that raise 
intracellular AMP levels and activate chloride channels 
[Stutto et al. Proc. Nat. Acad. s«i , p. s . *, 82s 6677 
(1985) J. other assays include the measurement of changes 
in cellular potentials by patch clamping of whole cells 
or of isolated membranes (Fri2zell et al. Science 233: 
558 (1986), Welsch and Liedtke Mature 322: 467 (1986) ]or 
the study of ion fluxes in epithelial sheets of confluent 
cells (Widdicombe et al. Proc. Wat. i ea a. Hw< , 82 . 6167 
15 (1985) ] . Alternatively, RNA made from the CF gene could 
be injected into Xenoous oocytes. The oocyte will 
translate RNA into protein and allow its study J As other 
more specific assays are developed these can also be used 
in the study of transf acted mutant CFTR protein function. 
20 "Domain-switching" experiments between mutant CFTR 

and the human multidrug resistance P-glycoprotein can 
also be performed to further the study of the mutant CFTR 
protein, in these experiments, plasmid expression vectors 
are constructed by routine techniques from fragments of 
25 the mutant CFTR sequence and fragments of the sequence of 
P-glycoprotein ligated together by ONA ligase so that a 
protein containing the respective portions of these two 
proteins will be synthesized by a host cell transfected 
with the plasmid. The latter approach has the advantage 
that many experimental parameters associated with 
multidrug resistance can be measured. Hence, it is now 
possible to assess the ability of segments of mutant CFTR 
to influence these parameters. 

These studies of the influence of mutant CFTR on ion 
35 transport will serve to bring the field of epithelial 
transport into the molecular arena. 
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£j£ THERAPIES 

It is understood that the major aim of the various 
biochemical studies using the compositions of this 
invention is the development of therapies to circumvent 
or overcome the CF defect, using both the pharmacological 
and the "gene-therapy" approaches. 

In the pharmacological approach, drugs which 
circumvent or overcome the CP defect are sought. 
Initially, compounds may be tested essentially at random, 
and screening systems are required to discriminate among 
many candidate compounds. This invention provides host 
cell systems, expressing various of the mutant CF genes, 
which are particularly well suited for use as first level 
screening systems. Preferably, a cell culture system 
using mammalian cells (most preferably human cells) 
transfected with an expression vector comprising a DNA 
sequence coding for CFTR protein containing a CF- 
generating mutation, for example the 1507 deletion, is 
used in the screening process. Candidate drugs are 
tested by incubating the cells in the presence of the 
candidate drug and measuring those cellular functions 
dependent on CFTR, especially by measuring ion currents 
where the transmembrane potential is clamped at a fixed 
value. To accommodate the large number of assays, 
25 however, more convenient assays are based, for example, 
on the use of ion-sensitive fluorescent dyes. To detect 
changes in Cl-on concentration SPQ or its analogues are 
useful. 

Alternatively, a cell-free system could be used. 
30 Purified CFTR could be reconstituted into articifial 
membranes and drugs could be screened in a cell-free 
assay [Al-Aqwatt, Science, (1989)]. 

At the second level, animal testing is required, it 
is possible to develop a model of CF by interfering with 
35 the normal expression of the counterpart of the CF gene 
in an animal such as the mouse. The "knock-out" of this 
gene by introducing a mutant form of it into the germ 
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line of animals will provide a strain of animals with CF- 
Hke syndromes. This enables testing of drugs which 
showed a promise in the first level cell-based screen 

furth « knowledge is gained about the nature of 
5 the protein and its function, it will be possible to 
predict structures of proteins or other compounds that 
interact with the CFTR protein. That i„ turn will allow 
lZ\ CB T, ? Predictions to ba **out potential drugs 

10 TL I ^ Wlth ^ Pr ° tein - d -me efflct 

10 on the treatment of the patients. Ultimately such drugs 

may be designed and synthesized chemically on the basis 

of structures predicted to be required to interact with 

domains of CFTR. This approach is reviewed in Capsey and 

is !!^!!!^ gSm i ! n1lY Fnrrin ™ r ^ ™ — i i n - mi r 

Stockton Press, New York, 1988. These potential drugs 
must also be tested in the screening system. 

defJ reatnent CF ^ per£o «" d * replacing the 
20 fu^ctLT TT n0rMl Pr ° tein ' * Elating the 

"t" TT'T" Pr ° tein ° r * ^"^ng another 
step in the pathway in which CFTR participates in order 
to correct the physiological abnormality. 

n^J 0 be ? ble t0 rePlaCe defectiv e protein with the 
normal version, one must have reasonably large amounts of 
25 pure CFTR protein. Pure protein can be obtained *T 
described earlier from cultured cell systems. Delivery 
of the protein to the affected airways tissue will 

f^uL^T 91119 ^ lipid - Contain ^ vesicles that 
30 ib k ^^oration of the protein into the cell 

30 membrane, it may also be feasible to use vehicles that 
incorporate proteins such as surfactant protein, such as 
SAP(Val, or SAP(Phe, that performs this function 
naturally, at least for lung alveolar cells. (Per Patent 
Application WO/8803170, whitsett et al. May 7 ^88 and 
35 FCT Patent Application WO89/04327, Benson et al Z 18 
1989) * The containing vesicles are inj^? 
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the airways by inhalation or irrigation, techniques that 
are currently used in CF treatment (Boat et al, supra ) . 

mssjssgs&sx 

Modulation of CFTR function can be accomplished by 
5 the use of therapeutic agents (drugs) . These can be 

identified by random approaches using a screening program 
in which their effectiveness in modulating the defective 
CFTR protein is monitored In vitro . Screening programs 
can use cultured cell systems in which the defective CFTR 

10 protein is expressed. Alternatively, drugs can be 

designed to modulate CFTR activity from knowledge of the 
structure and function correlations of CFTR protein and 
from knowledge of the specific defect in the CFTR mutant 
protein (Capsey and Delvatte, supra 1 # it is possible 

15 that the mutant CFTR protein will require a different 

drug for specific modulation. It will then be necessary 
to identify the specific mutation (s) in each CF patient 
before initiating drug therapy. 

Drugs can be designed to interact with different 

20 aspects of CFTR protein structure or function. For 

example , a drug (or antibody) can bind to a structural 
fold of the protein to correct a defective structure. 
Alternatively, a drug might bind to a specific functional 
residue and increase its affinity for a substrate or 

25 cofactor. Since it is known that members of the class 
of proteins to which CFTR has structural homology can 
interact, bind and transport a variety of drugs, it is 
reasonable to expect that drug-related therapies may be 
effective in treatment of CF. 

30 A third mechanism for enhancing the activity of an 

effective drug would be to modulate the production or the 
stability of CFTR inside the cell. This increase in the 
amount of CFTR could compensate for its defective 
function. 

35 Drug therapy can also be used to compensate for the 

defective CFTR function by interactions with other 
components of the physiological or biochemical pathway 
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necessary for the expression of the CFTR function. These 
interactions can lead to increases or decreases in the 
activity of these ancillary proteins. The methods for the 

s T ° n ° f ^ WOUld be -i»ilar to those 

5 described above for CPTR-related drugs 

in other genetic disorders, it has been possible to 

ZZT 1 C ° n8equenoes <* stored or missing normal 
functions by use of dietary modifications. This has 
taken the form of removal of metabolites, as in the case 
10 of phenylketonuria, where phenylalanine is rernov^ f roT 
the diet in the first five years of li fe to prevent 

rmiL^r? 11 ' or by the additi ° n ° f ^ 

of metabolites to the diet, as in the case of adenosime 

is iT n :r, deficiency where — functi - ai correctL: : f 

15 the activity of the enzyme can be produced by the 

dlL ? Jf enZyBe t0 diet ' «~ the 

£S * ! ! CPrR fUnCti ° n haVe «^cidated and the 

basic defect in cf has been defined, therapy may be 
achieved by dietary manipulations. 

20 «: SeC ° nd P ° tential therapeutic approach is so- 

called -gene-therapy" in which normal copies of the cf 
gene are introduced in to patients so as to success fuT ly 

IvlcZ IT**' Pr ° tein ^ ^ ^ ^thelial cells of 
affected tissues. It is most crucial to attempt to 
25 achieve this with the airway epithelial cells of toe 

celLTT CF 9ene 18 deli vered to these 

cells in form in which it can be taken up and code for 

res" ; ; ^ Pr0t r in t0 PrOVid ° "^tory'functiot L a 
result, the patient's quality and length of life will be" 

d^Uver tr nde ", ° ltiTOtely ' ° f ~ the aL L , 6 
deliver the gene to all affected tissues. 

SEBE-XBSB&EX 

One approach to therapy of CF is to insert a „ OMa i 

35 abated 0 ' tT * — ^ ^ ™- ° 
35 affected patients, it is important to note that the 

respiratory system is the primary cause 
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feature, It is relatively well treated today with enzyme 
supplementation. Thus, somatic cell gene therapy [for a 
review, see T. Priedmann, S£isH£& 244:1275 (1989)] 
targeting the airway would alleviate the most severe 
5 problems associated with CF. 

A * Retroviral Vgctorp - Retroviruses have been 
considered the preferred vector for experiments in 
somatic gene therapy, with a high efficiency of infection 
and stable integration and expression (Orkin et al Proa. 
io Med. Genet 7: 130, (1988)). a possible drawback is that 
cell division is necessary for retroviral integration, so 
that the targeted cells in the airway may have to be 
nudged into the cell cycle prior to retroviral infection, 
perhaps by chemical means. The full length CF gene cDNA 
15 can be cloned into a retroviral vector and driven from 

either its endogenous promoter or from the retroviral LRT 
(long terminal repeat) . Expression of levels of the 
normal protein as low as 10% of the endogenous mutant 
protein in CF patients would be expected to be 
20 beneficial, since this is a recessive disease. Delivery 
of the virus could be accomplished by aerosol or 
instillation into the trachea. 

B * Other Virfll YectorR . Other delivery systems 
which can be utilized include adeno-associated virus 
25 (AAV, McLaughlin et al, J. Viml 62:1963 (1988)] 

vaccinia virus [Moss et al Annu. BSSUlmmal 5:305, 
1987)], bovine papilloma virus [Rasmussen et al, Mgi&afla 
Snsmal 139:642 (1987)J or member of the herpesvirus 
group such as Epstein-Barr virus (Margolskee et al MaU 
30 Cellt Blftl 8:2937 (1988)]. Though much would need to be 
learned about their basic biology, the idea of using a 
viral vector with natural tropism for the respiratory 
track (e.g. respiratory syncytial virus, echovirus, 
Coxsackie virus, etc.) is possible. 
35 c * Non-viral fiene Tranpfftr . other methods of 

inserting the CF gene into respiratory epithelium may 
also be productive; many of these are lower efficiency 
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and would potentially require infection In yitro 
selection of transf ectants , and reimplantation, ' T his 
would include calcium phosphate, DEAE dextran 
electroporation, and protoplast fusion, a particular^ 

5 ptiitr: idea 18 the use ° f «~ < whiLi: h tL iy 

DeSer i 9a ° 7 r rry ^ * C ° 8tr °' ^™*< Marcel- 

Dekker, 1987] . Synthetic cationic lipids such as dotma 

[Felger et al gTCg, Natl, ftr,d,ffr1 BSfl 84:7413 (1987)1 
X0 ™r a8e e " iCienCy «" — - ™* outvie 

£ri c MgttMi Hviw 

vili cTcrocTf ! " B ° UM " oth « for er 

vill be crucial to understendlng th. disease „ a for 

testing or pos.ibl. therapies (t or gsneral review J 
15 creating animal models, see Erirv^ . 

43:582 .198811 *~ J Erlckson < f t Vm*£SD2S 

«.582 (1988)]. currently no animal model of the CP 

exists. The evolutionary conservation of the cf Mne , 

ttitt.t the ~~- »*i*i£i7 b iz 

20 or^ )# " 18 8hOWn in Pl9Ure «• that « 

dint! 9Sne 8X18118 in the BOUS « Rafter to be 

ZlTnTl' C ° rreSPOnding P^ein as mCPTR) , and 

libraries using the human CP gene probes it <« " 
that the generatlc, of a specie Lt«L ^e^T 

to reoro. 0900 :^ 0 I5 °' ™ lt " iOT * ~" opZL 

or thT^™ «*«otype. though complete inactivati™ 
or the nCFTR gene win also be . useful eutant to 
generate. *° 

A. MutMsniBlT , Inactivation of the mep _ 
0 be achieved by chemical ( ..,. Johnson J"^"^" 

S " 1 ^ »«»»• '»«)] or x-r. y -ute ris I pc n u 

foil m " K 1271141 tl979 » « pp 

n "°;: a d t f 'f»i«ti„„. offspring hst^osygouTfor 

* aemonstrate loss of one allele by dosaoe r>y 
fsilur. to inherit on. parental .11.1. lf J^T^ 
is b.in, assessed. This approach has previously been 
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successfully used to Identify mouse mutants for o-globin 
[Whitney et al Prpc. Natl. Acad. fiH, ^ 77:1087 
(1980) ] , phenylalanine hydroxylase [McDonald et al 
Pediatr, B«s 23:63 (1988)], and carbonic anhydrase 11 
5 [Lewis et al Prog. Natl, ft<?ad, pnj , T Tffft 85:1962. (1988)]. 
B. Transgenics A mutant version of CPTR or mouse 
CFTO can be inserted into the mouse germ line using now 
standard techniques of oocyte injection [Camper, Trends 
I n genetics (1988)]; alternatively, if it is desirable to 
10 inactivate or replace the endogenous mCF gene, the 

homologous recombination system using embryonic stem (ES) 
cells [capecchi, Science. 244:1288 (1989)] may be applied. 

1. Oocyte Inject Ion Placing one or more copies 
of the normal or mutant mCF gene at a random location in 
15 the mouse germline can be accomplished by microinjection 
of the pronucleus of a just-fertilized mouse oocyte 
followed by reimplantation into a pseudo-pregnant foster 
mother. The liveborn mice can then be screened for 
integrants using analysis of tail dna for the presence of 
20 human CP gene sequences. The same protocol can be used 
to insert a mutant mCP gene. To generate a mouse model 
one would want to place this transgene in a mouse 
background where the endogenous mCF gene has been 
inactivated, either by mutagenesis (see above ) or by 
25 homologous recombination (see below) . The transgene can 
be either: a) a complete genomic sequence, though the 
size of this (about 250 kb) would require that it be 
injected as a yeast artificial chromosome or a chromosome 
fragment; b) a cDNA with either the natural promoter or a 
heterologous promoter; c) a "minigene- containing all of 
the coding region and various other elements such as 
introns, promoter, and 3' flanking elements found to be 
necessary for optimum expression. 

This.,* 2 - B£ *f* iraJ IntfifiUoj n f twy fflnbryog. 
This alternative involves inserting the CFTR or mCF gene 
into a retroviral vector and directly infecting mouse 
embroyos at early stages of development generating a 
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chimera [Soriano et al £sU 46:19 (1986)]. At least some 
of these will lead to germline transmission. 

m . t 3 ; Eg Wlff mhI Haaalggsaia PomTnMnnttn u. The 

embryonic stem cell approach (Capecchi, subta and 
5 Capecchi, Trends gWTW n 5:70 (1989)] allows the 
possibility of performing gene transfer and then 
screening the resulting totipotent cells to identify the 
rare homologous recombination events, once identified, 
these can be used to generate chimeras by injection of 
mouse blastocysts, and a proportion of the resulting mice 
will show germline transmission from the recombinant 
line. There are several ways this could be useful in the 
generation of a mouse model for CP: 

a) mactivation of the mCF gene can he conveniently 
accomplished by designing a DNA fragment which contains 
sequences from a mCFTR axon flanking a selectable marker 
such as nga. Homologous recombination will lead to 
insertion of the sequences in the middle of an exon 
inactivating mCFTR. The homologous recombination events 
(usually about i in 1000, can be recognized from the 
heterologous ones by DNA analysis of individual clones 

1988 ^r 7',"" * * **** 1*:SS87 

(1988), joyner et al 338 ,i53 a98 9) ; zimmer et al 

the heterologous events [such as the use of an HSV TK 
gene at the end of the construct, followed by the 

!»wn° V ^rr 10 ?' ManSOUr et * 1 ' "6:348 
(1988)] This inactivated mCFTR mouse can then be used 

to introduce a mutant CP gene or »cp gene containing, for 
ZZliln ^ 1507 abn0rB,allty ° r *»* <»ther desired 

? " 18 POS8ible «»* specific mutants of mCPTR 
cONA be created in one step. For example, one can mal a 
construct containing mCP intron 9 sequences at the 
35 end, a selectable nsfi gene in the middle, and Intro 9 + 
exon io (containing the mouse version of the i 50 7 
station) at the 3' end. a homologous recombination 
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event would lead to the insertion of the jjsa gene in 
intron 9 and the replacement of exon 10 with the mutant 
version. 

c) If the presence of the selectable asc marker in 
5 the intron altered expresson of the mCP gene, it would be 
possible to excise it in a second homologous 
recombination step. 



10 



d) It is also possible to create mutations in the 
mouse germline by injecting oligonucleotides containing 
the mutation of interest and screening the resulting 
cells by PCR. " 

This embodiment of the invention has considered 
primarily a mouse model for cystic fibrosis. Figure 4 
shows cross-species hybridization not only to mouse DNA, 
15 but also to bovine, hamster and chicken DNA. Thus, it is 
contemplated that an orthologous gene will exist in many 
other species also, it is thus contemplated that it will 
be possible to generate other animal models using similar 
technology* 

20 Although preferred embodiments of the invention have 

been described herein in detail, it will be understood by 
those skilled in the art that variations may be made 
thereto without departing from the spirit of the 
invention or the scope of the appended claims. 
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CLAIMS: 

1. A DNA molecule comprising an intronless DMA sequence 
encoding a mutant CFTR polypeptide having the sequence 
according to Figure 1 for amino acid residue positions l 
to 1480 and, further characterized by nucleotide sequence 
variants resulting in deletion or alteration of amino 
acids of residue positions 85 , 148, 178, 455, 493, 507, 
542, 549, 551, 560, 563, 574, 1077 and 1092. 

2. A DNA molecule comprising an intronless DMA sequence 
encoding a mutant CFTR polypeptide having the sequence 
according to Figure 1 for DNA sequence positions 1 to 
4575 and, further characterized by nucleotide sequence 
variants resulting in deletion or alteration of DNA at 
DNA sequence positions 129, 556, 621+1, 711+1, 1717-1 and 
3659. 



3. A DNA molecule comprising an intronless DNA sequence 
selected from the group consisting of: 
20 <a) DNA sequences which correspond to the selected 

sequence of claim l or 2 and which encode, on expression, 
for mutant CFTR polypeptide; 

(b) DNA sequences which correspond to a fragment of 
a selected sequence in claim l or 2 including at least 16 

25 nucleotides; 

(c) DMA sequences which comprise at least 16 
nucleotides and encode a fragment of the selected amino 
acid sequence of claim l or 2; and 

(d) DNA sequences encoding an epitope 

30 characteristic of the mutant CFTR protein encoded by at 
least 18 sequential nucleotides in the selected sequence 
of claim 1 or 2. 

4. The DNA molecule of claim 1 or 2 wherein the DNA 
35 molecule is a cDNA. 
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is a cDNA° MA n ° leCUle ° f Clai ° 3 Wherein the D ™ molecule 

6. A purified RNA molecule comprising an RNA sequence 
5 corresponding to the DNA sequence recited in claim 3 . 

7. A purified nucleic acid probe comprising a DNA or 
RNA nucleotide sequence corresponding to the selected 
sequence recited in parts (b) , (c) , or (d, of claim 3. 



10 



8. 



A nucleic acid probe according to claim 7 wherein 
said sequence comprises AAA GAA AAT ATC TTT GGT GTT, and 
its complement. 

15 9 A recombinant cloning vector comprising the DNA 
molecule of claim 3. 

10. The vector of claim 9 wherein said DNA molecule is 
20 *r 7 linkGd t0 M i« control sequence in 

polypeptide can be expressed, said mutant cftr 
polypeptide being selected from the group of CFTR 
Polypeptides at mutant positions 85, 148, i 78 , 45 5 493 
507, 542, 549, 551, 560, 563, 574, !077 and 1092 sai a ' 

121111?" T r01 8SqUenCe bSing 8eleCted «*» ^e group 
consisting of sequences that control the expression IT 

genes of prokaryotic or eukaryotic cells and their 
viruses and combinations thereof. 



25 



30 11. 



35 



The vector of claim 10 wherein said DNA molecule is 
operatively linked to an expression control sequence in 
said recombinant DNA molecule so that a mutant CFTR 
Polypeptide can be expressed, said mutant 
polypeptide being selected from the group of CFTR 
polypeptides at mutant DNA sequence positions ^ 556 , 

se^en ^' ""^ *"* ^ Session control 

sequence being selected from the group consisting of 
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sequences that control the expression of genes of 
prokaryotic or eukaryotlc cells and their viruses and 
combinations thereof. 

12. The vector of claim lo or li vherein the expression 
control sequence is selected from the group consisting of 

smef :rr eo ' ayat °*' th ° *** ******* ^ 

11^2' Z OPerat ° r ^ Pr ° m0ter re ' ions <* P^ge 
lambda the control region of fd coat protein, the early 

and late promoters of SV40, promoters derived from 

vLur; h aden ° VirUS ' retrOVlrua < '—lovirus and simian 
virus the promoter for 3-phosphoglycerate kinase, the 

IITIIITZT r id phosphatase ' the 

yeast alpha-mating factors and combinations thereof, 
claim^^ tranSf0rBed With «- ™<*°* according to 



14. The host of claim 13 selected from the group 
consisting 0 f grains of ^ ssll , £ssudfim2nafi , Z S il^ 

fff^f' Ea f llaS **™t^™opr>Um. or otherlacm? 
other bacteria; yeast; fungi; insect; mouse or other 
animal; plant hosts; or human tissue cells. 

15. The host of claim 14 wherein said human tissue cells 
are human epithelial cells. 

16. A method for producing a mutant CFTR polypeptide 
comprising the steps of: Polypeptide 

30 of / a> C " ltUrlng a host cel l transfected by the vector 
of claim a in a medium and under conditions favorable for 
expression of the mutant CFTR polypeptide selec^eTfrom 
the group having Bu tant positions 85, 148, 17^5 ^ 

3 5 507 ' £?' T; 560 ' 563 ' 57 <- ™ - 

(b) isolating the expressed mutant CFTR 
Polypeptide. 
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17. A method for producing a mutant CPTR polypeptide 
comprising the steps of: 

(a) culturing a host cell transfected by the vector 
of claim 8 in a medium and under conditions favorable for 
expression of the mutant CFTR polypeptide selected from 
the group having mutant DNA sequence positions 129, 556, 
621+1, 711+1, 1717-1 and 3659; 

(b) isolating the expressed mutant CFTR 
polypeptide. 

18. A mutant CFTR polypeptide substantially free of 
other human proteins and encoded by the DNA sequence 
recited in claim 3. 

15 19. A substantially pure mutant CFTR polypeptide 
according to claim 18 made by chemical or enzymatic 
peptide synthesis. 



10 



20 



25 



30 



35 



20. A polypeptide coded for by expression of a DNA 
sequence recited in claim 3. 

21. A method for screening a subject to determine if 
said subject is a CF carrier or a CF patient comprising 
the steps of: 

providing a biological sample of the subject to be 
screened; and providing an assay for detecting in the 
biological sample, the presence of at least a member from 
the group consisting of a mutant CF gene, a mutant CFTR 
polypeptide products and mixtures thereof, the mutants 
being defined by mutations at protein positions 85, 148, 
178, 455, 493, 507, 542, 549, 551, 560, 563, 574, 1077 
and 1092. 

22. A method for screening a subject to determine if 
said subject is a CF carrier or a CF patient comprising 
the steps of: 
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providing a biological sample of the subject to be 
screened; and providing an assay for detecting in the 
biological sample, the presence of at least a member from 
the group consisting of a mutant CF gene, a mutant CFTR 
5 polypeptide products and mixtures thereof, the mutants 
being defined by mutations at DNA sequence positions 129, 
556, 621+1, 711+1, 1717-1 and 3659. 

23. The method of claim 21 or 22 wherein the biological 
10 sample includes at least part of the genome of the 

subject and the assay comprises an hybridization assay. 

24. The method of claim 23 wherein the assay further 
comprises a labelled nucleotide probe according to claim 

15 7. 

25. The method of claim 24 wherein said probe comprises 
the nucleotide sequence of claim 8. 

20 26. The method of claim 21 or 22 wherein the biological 
sample includes a CFTR polypeptide of the subject and the 
assay comprises an immunological assay. 

27. The method of claim 26 wherein the assay further 
25 includes an antibody specific for said mutant CFTR 

polypeptide. 

28. The method of claim 26 wherein the assay is a 
radioimmunoassay . 

30 

29. The method of claim 27 wherein the antibody is at 
least one monoclonal antibody. 

30. The method of claim 21 or 22 wherein the subject is 
35 a human fetus in utero . 



WO 91/10734 PCT/CA91/00009 

126 

31. The method of claim 24 wherein the assay further 
includes at least one additional nucleotide probe 
according to claim 7. 

5 32. The method of claim 31, wherein the assay further 
includes a second nucleotide probe comprising a different 
DNA sequence fragment of the DNA of Figure 1 or its rna 
homologue or a different DNA sequence fragment of human 
chromosome 7 and located to either side of the dna 
10 sequence of Figure 1. 

33. in a process for screening a potential CF carrier or 
patient to indicate the presence of an identified cystic 
fibrosis mutation in the CF gene, said process including 
15 the steps of: 

(a) isolating genomic DNA from said potential CF 
carrier or said potential patient; 

(b) hybridizing a DNA probe onto said isolated 
genomic DNA, said DNA probe spanning a mutation in said 

20 CF gene wherein said DNA probe is capable of detecting 
said mutation, said mutation being selected from the 
group of mutations at protein positions 85, 148, i 78 

455, 493, 507, 542, 549, 551, 560, 563, 574, 1077 and 
1092; 

25 (c) treating said genomic DNA to determine presence 

or absence of said DNA probe and thereby indicating in 
accordance with a predetermined manner of hybridization, 
the presence or absence of said cystic fibrosis mutation. 

30 34 i„ a process for screening a potential CF carrier or 
patxent to indicate the presence of an identified cystic 
fibrosis mutation in the CF gene, said process including 
the steps of: 

(a) isolating genomic DNA from said potential CF 
35 carrier or said potential patient; 

(b) hybridizing a DNA probe onto said isolated 
genomic DNA, said DNA probe spanning a mutation in said 
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CP gene wherein said DNA probe is capable of detecting 
said mutation, said mutation being selected from the 
group of nutations at DNA sequence positions 129, 556, 
621+1, 711+1, 1717-1 and 3659. 

35. A process for detecting cystic fibrosis carriers of 
a mutant CP gene wherein said process consists of 
determining differential mobility of heteroduplex PGR 
products in polyacrylamide gels as a result of deletions 
or alterations in the mutant CP gene at one or more of 
the protein positions 85, 148, 178, 455, 493, 507, 542 
549, 551, 560, 563, 574, 1077 and 1092. 

36. A process for detecting cystic fibrosis carriers of 
is a mutant CP gene wherein said process consists of 

determining differential mobility of heteroduplex PCR 
products in polyacrylamide gels as a result of deletions 
or alterations in the mutant CP gene at one or more of 
20 P ° Slti0nS ™> «6, 621+1, 711+1, 1717-1 



llL k aSSaying for «»• Presence of a mutant CP 

gene by immunoassay comprising: 

(a) an antibody which specifically binds to a gene 
25 product of a mutant CP gene having a mutation at a 

protein position selected from the group consisting of 
protein positions 85, 148, 178, 455, 493, 507, 542 549 
551, 560, 563, 574, 1077 and 1092; ' 

30 M JV / 6agent * eans for detecting the binding of the 
30 antibody to the gene product; and 

(c) the antibody and reagent means each being 
present in amounts effective to perform the immunoassay. 



35 



38. 



A kit for assaying for the presence of a mutant CF 
gene by immunoassay comprising: 

»dJ? antl *° dy WhiCh 8pecifical ly Wnds to a gene 
product of a mutant CP gene having a mutation at a DNA 
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sequence position selected from the group consisting of 
DMA seqence positions 129, 556, 621+1, 711+1, 1717-1 and 
3659; 

(b) reagent means for detecting the binding of the 
5 antibody to the gene product; and 

(c) the antibody and reagent means each being 
present in amounts effective to perform the immunoassay. 

39. The kit of claim 37 or 38 wherein said reagent meant 
10 for detecting binding is selected from the group 

consisting of fluorescence detection, radioactive decay 
detection, enzyme activity detection or color imetric 
detection. 
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40. A kit for assaying for the presence of a CF gene by 
hybridization comprising: 

(a) an oligonucleotide probe which specifically 
binds to a mutant CF gene; 

(b) reagent means for detecting the hybridization 
of the oligonucleotide probe to a mutant CF gene having a 
mutation at a protein position selected from the group 
consisting of protein positions 85, 148, 178, 455, 493, 
507, 542, 549, 551, 560, 563, 574, 1077 and 1092; and ' 

(c) the probe and reagent means each being present 
in amounts effective to perform the hybridization assay. 



41. A kit for assaying for the presence of a CF gene by 
hybridization comprising: 

(a) an oligonucleotide probe which specifically 
30 binds to a mutant CF gene; 

(b) reagent means for detecting the hybridization 
of the oligonucleotide probe to a mutant CF gene having a 
mutation at a DNA sequence position selected from the 
group consisting of dna sequence positions 129, 556, 

35 621+1, 711+1, 1717-1 and 3659; and 

(c) the probe and reagent means each being present 
in amounts effective to perform the hybridization assay. 
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42. An animal comprising a heterologous cell system 
comprising a recombinant cloning vector of claim 9 which 
induces cystic fibrosis symptoms in said animal. 

43. The animal of claim 42 wherein said animal is a 
mammal • 



44. The animal of claim 43 
rodent. 

45. The animal of claim 44 
mouse . 



wherein said m amm al is a 
wherein said rodent is a 
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46. m a polymerase chain reaction to amplify a selected 
exon of a cDNA sequence of Figure l, the use of 
oligonucleotide primers from intron portions near the 5' 
and 3' boundaries of the selected exon of Figure 18. 

47. m a polymerase chain reaction of claim 46, the use 
of oligonucleotide primers xi-5 and xi-3 of Table 5 where 
X is the exon number 1, 3 , 4, 5, 6a, 6b, 7 through 13, 
14a, 14b, 15 and 16, 17a, 17b and 18 through 24. 
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FIG/I. 



AATTGGAAGCAAATGACATCACAGCAGGTCAGAGAAAAAGGGTTGAGCGGCAGGCACCCA 
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181 
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GAGTAGTAGGTCTTTGGCATTAGGAGCTTGAGCCCAGACGGCCCTAGCAGGGACCCCAGC 

MQ RSPLEKASVVSKLF 
GCCCGAGAGACCATGCAGAGGTCGCCTCTGGAAAAGGCCAGCGTTGTCTCCAAACTTTTT 



F d W 
rTCAQCTGC 



TRPILRKGYRQRLELSD 
TTCACJCTGGACCAGACCAATTTTGAGGAAAGGATACAGACAGCGCCTGGAATTGTCAGAC 

I Y Q I PSVDSADNLSEKLEriE 
ATATACCAAATCCCTTCTGTTGATTCTGCTGACAATCTATCTGAAAAATTGGAAAOAGAA 

WORE L ASKKNPKL I NALRRC 
TGGGATAGAGAGCTGGCTTCAAAGAAAAATCCTAAACTCATTAATGCCCTTCGGCGATGT 



TTTTTCTGG AGATTTATGTTCTATG t» \£ CI I f 1 Ft I AT 1 TAGGgIsLu^i; A cV/> AAGC& 



V O P L 



JjGRIIASYDPDNKEE 



I L L I H P 



d: 



A l " F V W ■. I . . A .. ? L ° V » T' *■ M ft T."71 W 



R K A A 



16 



36 



56 



76 



LJL" R E r mfygtft, yl g|f, v t k rf 96 



116 



GTACAGCCTCTCTTACTGGGAAGAATCATAGCTTCCTATGACCCGGATAACAAGGAGGAA 

R ISTAIYLGTCT. CLLPTVP T~T1 l 36 
CGCTCTATCGCGATTTATCTAGGCATAGGCTTATGCCTTCTCTTTATTGTGAGGACACTG 



FGLHKIGMQMRIAM 156 



CTCCTACACCCAGCCATTTTTGGCCTTCATCACATTGGAATGCAGATGAGAATAGCTATG 
FSLIYKKlTLKLSS RVLDKIS 176 

tttagtttgatttataagaagKctttaaagctgtcaagccgtgttctagataaaataagt 

IGQLVSLLSNNLNKFD E I g I L a! 196 
ATTGGACAACTTGTTAGTCTCCTTTCCAACyiACCTGAACAAATTTGATGAAK 



216 



TTGGCACATTTCGTGTGGATCGCTCCTTTGC^ 

ELL Ql A S A P C R L fi p T, T V T. a T. Pi 
GAGTTGTTACAGGCGTCTGCCTCCTCTGGACMGGraTCCTGATA 

I 0 ft G L Q I RMMMKYRDQRAGK I S 
CAGGCTGGGCTAGGGAGAATGATGATGAAGTACAGAGATCAGAGAGCTGGGAAGATCAGT 

ERLVITSEMIENIQSVKAYC 276 
GAAAGACTTGTGATTACCTCAGAAATGATTGAAAATATCCAATCTGTTAAGGCATACTGC 

WEEAME KMIENLRdTELKLT 
TG^GAAGAAGCAATGGAAAAAATGATTGAAAACTTAAGACWACAGAACTGAAACTGACT 



236 



256 



296 



316 



356 



Y V R Y F N S ISAPFPsgpfI 
CGGAAGGCAGCCTATGTGAGATACTTCAATAGCTCAGCCTTCTTCTTCTC^GGGTTCTTT 

IVVFLSVLPYA LZD Vl G T T 1, 5 V fl 336 
GTGGTGTTTTTATCTGTGCCTCCCTATG^ 

LL TTISFCTVT. R MAV^ T R Q F P W 
TTCACCACCATCTCATTCTGCATTGTTCTGCGCATGGCGGTCACTCGGCAATTTCCCTGG 

GCTGTACAAACATGGTATGACTCTCTCGGAGC^ 376 

KQEYKTLEYNLTTTEVVMEN 396 
AAGCAAGAATATAAGACATTGGAATATAACTTAACGACTACAGAAGTAGTGATGGAGAAT 

VT AFWEElGFGELFEKAKONK 416 
GTAACAGCCTTCTGGGAGGAGpGATTTGGGGAATTATTTGAGAAAGCAAAACAAAACAAT 
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FIG. 1 (cont'd) 



N N R K T S N 



"""»*»ni>DD5LFFSNrsx.T 
AACAATAGAAAAACTTCTAATGGTGATGACAGCCTCTTCTTCAGTAA TTTCTCACTTCTT 

GTPVLKD I N F K I RRCOT. LAV 456 

GGTACTCCTGTCCTGAAAGATATTAATTTCAAGAT AGAAAGAGGACAGTTGTTGGCGGTT 



AGS 



G A G Kl T S 



LMHIMGELE 



GCTGGATCCACTGGAGCAGGCAADACTTCACTTCTAATGATGATTATGGGAGAACTGGAG 
P S E G K ® 



IKACOLEE 



TACAGAAGCGTCATCAAAGCATGCCAACTAGAAGAG bACATCTCCAAGTTTGCAGAGAAA 



NIVLGEGC 



GACAATATAGTTCTTGGAGAAGGTGGAATCACACTGAGTGGAGGTCAACGAGCAAGAATT 



g G G Q R ^ p j 



L A 



Sckc 



A V 



T CT TT AG C AAGhGCAGTAT AC AMG ATGC 



P V L T E K E 



F E 



TACCTAGATGTWTAACAGAAAAAGAAAT 

NKTRILVTSKMEHLKKADKI 616 
AACAAAACTAGGATTTTGGTCACTTCTAAAATGGAACATTTAAAGAAAGCTGACAAAATA 

LILHEGSSYFYGTFSELONL 636 
TTAATTTTGCATGAAGGTAGCAGCTATTTTTATGGGACATTTTCAGAACTCCAAAATCTA 
e 

OPDFSSKLMGCDSFDQFSAE 656 
CAGCCAGACTTTAGCTCAAAACTCATGGGATGTGATTCTTTCGACCAATTTAGTGCAGAA 

RRNS1LTETLHRFSLEGDAP 676 
AGAAGAAATTCAATCCTAACTGAGACCTTACACCGTTTCTCATTAGAAGGAGATGCTCCT 



C K L M A 



V N Q G Q N 



H R 



476 



496 



CCTTCAGAGGGTAAAATTAAGCACAGTGGAAG AATTTCATTCTCT 
IMPGTlK EKII^Pevo Y D E Y R Si fi 

attatgcctggcaccattaaagaaaatatcatctt tggtgtttcctatgatgLtataga 



536 



556 



576 



596 



696 



VSWTETKKQSFKQTGEFGEK 
GTCTCCTGGACAGAAACAAAAAAACAATCTTTTAAACAGACTGGAGAGTTTGGGGAAAAA 

RKN5 I LN.P I N S I R K F S IVOK 716 
AGGAAGAATTCTATTCTCAATCCAATCAACTCTATACGAAAATTTTCCATTGTGCAAAAG 

TPLQMNGIEEDSDEPLERRL 736 

ACTCCCTTACAAATGAATGGCATCGAAGAGGATTCTGATGAGCCTTTAGAGAGAAGGCTG 
o 

SLVPDSEQGEAI LPRT SVIS 7Si; 
TCCTTAGTACCAGATTCTGAGCAGGGAGAGGCGATACTGCCTCGCATCAGCGTGATCAGC 

TGPTLQARRRQSVLNLMTHS 776 
ACTGGCCCCACGCTTCAGGCACGAAGGAGGCAGTCTGTCCTGAACCTGATGACACACTCA 



9 • • q 

GrTAACCAAGGTCAGAACATTCACCGAAAGACAACAGCATCCACACGAAAAGTGTCACTG ? * 6 

APQANLTELDIYSRRLSOET 816 
GCCCCTCAGGCAAACTTGACTGAACTGGATATATATTCAAGAAGGTTATCTCAAGAAACT 
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FIG.1 (cont'd) 

GLEISEElNEEDLKlECFFDD 836 
2581 GGCTTGGAAATAAGTGAAGAAATTAACGAAG AAGACTTAAA^AGTGCCTTTTTGATGAT 

MESIPAVTT WNTYLRY ITVH 856 

2 641 atggagagcataccagcagtgactacatggaacacataccttcgatatattactgtccac 

K s lItfvltwclvtpt. i f v & &l 876 
2701 aagagcttaatttttgtgctaatttggtgcttagtaatttttctggcagaggtggctgct 

IS L Y V3l WLLGNjTPLQDKcTsT 896 
2761 TCTTTGGTTGTGCTGTGGCTCCTTGGAAAfcACTCCTCTTCAAGACAAAGGGAATAGTACT 

T 

HSRNNSYAV I ITSTS I S Y v y p I 91* 
2821 CATAGTAGAAATAACAGCTATGCAGTGATTATCACCAGCACCAGTTCGTATTATGTGTTT 

lY I YVGVADTLLAMGPpI R G L P 
2881 TACATTTACGTGGGAGTAGCCGACACTTTGCTTGCTATGGGATTCTTCAGAGGTCTACCA 

LVHTLI TVSK I LHHKMLHSV 956 
2 941 CTGGTGCATACTCTAATCACAGTGTCGAAAATTTTACACCACAAAATGTTACATTCTGTT 

3001 CTTCAAGCACCTATGTCAACCCTCAACACGTTGA^ 



936 



LQAPMSTLNTLKAbGILNRF 976 
TTCAAGCACCTATGTCAACCCTCAACACGTTGAAAGCAGbTGGGATTCTTAATAGATTC 

5KDIAI LDDLLPL T I I F D F^I — 51 996 
3061 TCCAAAGATATAGCAATTTTGGATGACCTTCTGCCTCTTAC CATATTTGACTTCATCCAq 



3121 



GAIAVVA VLl Q P | Y I F» 1016 



TTGTTATTAATTGTGATTGGAGCTATAGCAGTTGTCGCAGTTTTACAACC CTACATCTTT 

3181 GTCGCAACAGTGCCAGTGATAGTGGCT 1036 

SQQLKQLESEGRSP IFTHLV 1056 
3241 TCACAGCAACTCAAACAACTGGAATCTGAAGGCAGGAGTCCAATTTTCACTCATCTTGTT 
• # 

TSLKGLWTLRAFGRQPYFET 1076 
ACAAGCTTAAAAGGACTATGGACACTTCGTGCCTTCGGACGGCAGCCTTACTTTGAAACT 



3301 
3361 



LFHKALNLHTANMPT vt i _ , MJ . 

CTGTTCCACAAAGCTCTGAATTTACATACTC 1096 

R W F 0 M R j I E M T P v I — P — p — t — 1 — 7t — m — 3 
3421 CGCTGGTTCCAAATGAGAATAGAAATGA^ 1116 
p - ^ ' S I L T T |q ") 

3481 atttccattttaacaaca^LaagL gaaggaag Sggtattatc 1136 
3541 ^^^^^^^^a^g¥g^^^atagmgtggatagcttI 1156 

MRSVSRVFKFI DHPTF. TKPt 

3601 atgcgatctgtgagcc<»ctctttaagttcattgacattcc^cagaaggtaaacctacc 

K S T 

3661 aagtcaaccaaaccatacaagaatggccaact^ 1196 

HVKKDD 1 W P SGGOMTVvdt m , ^, * 
3721 CACGTGAAC^GATGACATCTGGCCCTCAGGGGGCC^TCACTGTCAAAGATCTCACA 

3761 CCAaLtACACAGAAGgWIa ^ 1236 

3841 ggcc^a^tcggcctcttggg^g aactggaAgggm 1256 
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FIG.1. (cont'd) 

FLRLLNTECEIQIDGV3WDS 1276 
3901 TTTTTGAGACTACTGAACACTGAAGGAGAAATCCAGATCGATGGTGTGTCTTGGGATTCA 

ITLQOWRKA FGVIPOKVFir 1296 
3961 ATAACTTTGCAACAGTGGAGGAAAGCCTTTGGAGTGATACCACAGAAAGTATTTATTTTT 

SCTFRKKLDPYEQW3DQEIW 1316 
4021 TCTGGAACATTTAGAAAAAACTTGGATCCCTATGAACAGTGGAGTGATCAAGAAATATGG 

K V A D g I V C LRSVIEOrPGKLP 1336 
4081 AAAGTTGCAGATGAGCTTGGGCTCAGATCTGTGATAGAACAGTTXCCTGGGAAGCTTGAC 

FVLVDGGCV LSHGHKQLMCL 1356 
4H1 TTTGTCCTTGTGGATGGGGGCTGTGTCCTAAGCCATGGCCACAAGCAGTTGATGTGCTTG 

ARSVLflKXKILLL PEPflAHI, 1376 

4201 GCTAGATCTGTTCTCAGTAAGGCGAAGATCTTGCTGCTTG 

D P V]T Y Q I I R R TLKQAFADCT 1396 

ATCCAGlt^ACATACCAAATAATTAGAAGAACTCTAAAACAAGCATTTGCTGATTRrAr^ 



4261 GATCCAG1JVACATACCAAATAATTAGAAGAACTCTAAAACAAGCATTTGCTGATTGCACA 

VILCEHRIEAMLECQQF L|V I 
4321 GTAATTCTCTGTGAACACAGGATAGAAGCAATGCTGGAATGCCAACAATTTTTGpTCATA 

4381 



4741 
4801 



EENKVRQYDSIOKLLNERSL 1436 
GAAGAGAACAAAGTGCGGCAGTACGATTCCATCCAGAAACTGCTGAACGAGAGGAGCCTC 

FRQAISPSDRVKLFPHRNSS 14«ifi 
44 41 TTCCGGCAAGCCATCAGCCCCTCCGACAGGGTGAAGCTCTTTCCCCACCGGAACTCAAGC 

KCKSKPQ IAALKEETEEEVO 1476 
4501 AAGTGCAAGTCTAAGCCCCAGATTGCTGCTCTGAAAGAGGAGACAGAAGAAGAGGTGCAA 

D T R L » 

4561 GATACAAGGCTTTAGAGAGCAGCATAAATGTTGACATGGGACATTTGCTCATGGAATTGG l48 ° 
4 681 AAAACAAGGATGAATTAAGTTTTTTTTTAAAAAAGAMCATTTGGTAAGGGGAATTfiaRT. 



Vrm. iwmw. IWTOUAAUIGTTACCTCTGCCTCAG 

AAAACAAGGATGAATTAAGTTTTTTTTTAAAAAAGAAACATTTGGTAAGGGGAATTGAGG 
~S^I^Ii TGGCTCTTGAT ^ TGGCTTCCTGGC ^TAGTCAAATTGTGTGAAAGGTAC 
TTCAAATCCTTGAAGATTTACCAC1TGTGTTTTGCAAGCCAGATTTTCCTGAAMCCCTT 
4 861 GCCATGTGCTAGTAATTGGAAAGGCAGCTCTAAATGTCAATCAGCCTAGTTGATCAGCTT 

4 921 ATTGTCTAGTGAAACTCGTTAATTTGTAGTGTTGGAGAAGAACTGAMTCWACTOCTTA 

^GTTATGATTAAGTAATGATAACTGGAAACTTCAGCGGTTTATATAAGCTOGTATTCCT 
504 WTTCTCTCCTCTCCCCATGATGTTTAGAAACACAACTATATTGTOTCCTAAGCATTCCA 

5 01 ACTATCTCATTTCCAAGCAAGTATTAGAATACCACAGGAACCACAAGACTGCACATCAAA 
5161 ATATGCCCCATTCAACATCTAGTGAGCAGTCAGGAAAGAGAACTTCCAGATCCTGGAAAT 
5221 CAGGGTTAGTATTGTCCAGGTCTACCAAAAATCTCAATATTTCAGATAATCACAATACAT 
11**} CCCTTACCTGGGAAAGGGCTGTTATAATCTTTCACAGGGGACAGGATGGTTCCCWG^TG 
IA} ^ G ^ET GATATGCC ^^ CCC ^ CTCCAG ^ GTGACA AGCTCACAGACCTTTGAACT 

AGAGTTTAGCTGGAAAAGTATGTTAGTGC^TTGTCACAGGACAGCCCTTC^TCCACA 
lit} 5^^^^^^^*®^T6^TAAGTAGATAGGCCATGGGCACTGTGGCTAGACACACA 
5521 TGAAGTCCAAGCATTTAGATGTATAGGTTGATGGTGGTATGTTTTCAGGCTAGATGTATG 
5581 TACTTCATGCTGTCTACACTAAGAGAGAATGAGAGAC^CACTGAAGAAGCAC^ 
5641 AATTAGTTTTATATGCTTCTGTTTTATAATTTTGTGAAGCAAAATOCTCTCTCTAGGAAA 
5701 ^TTTATTTTAATAATGTTTCAAACATATATTACAATGCTGTATTTTAAAAGAATGATTA 
5761 TGAATTACATTTGTATAAAATAATTTTTATATTTGAAATATTGACTTTT^TGGCACTAG 
5821 TATTTTTATGAAATATTATGTTAAAACTGGGACAGGGGAGAACCTAGGGTGA^OTAACC 
5881 AGGGGCCATGAATCACCTTTTGGTCTGGAGGGAAGCCTTGGGGCTGATCGAGOTOTTGCC 
loo} ^ AG "G™GATTCa:AGCXAGACAra^ 

6 °°\ ACCACCAGTCTGACTGTTTCCATCAAGGGTACACTGCCTTCTCAACTCCAAACTGACTCT 
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FIG. 3A FIG. 3B FIG.3C FIG. 3D 
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FIG. 3E 
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FIG. AC 
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FIG. 9 
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FIG. 10C 
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CF son 




Carrier father 



501 510 
ThrlleLysGluArglleXlePheGlyValSer 
Normal ACCATTAAAGAAAATATCATCTTTGGTGTTTCC 

ThrlleLysGluArglle PheGlyValSer 
AI507 ACCATTAAAGAAAATATC TTTGGTGTTTCC 

ThrlleLyaGluArgllelle GlyValSer 
AF508 ACCATTAAAGAAAATATCAT TGGTGTTTCC 
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